Playbook
IntermediateHRManagerFounder

Interview Scorecards: Writing Them Well

A scorecard turns 'I liked them' into 'they demonstrated X'. Here's how to write one that calibrates a whole loop, reduces bias, and survives legal scrutiny.

14 min read Updated 2026-05-17

Structured interviews are roughly twice as predictive of job performance as unstructured ones (Schmidt & Hunter, 1998; updated in Sackett et al., 2022). The scorecard is the artifact that makes structure real. Without one, your hiring loop is a vibes-based committee — and the legal exposure is the cherry on top.

Why scorecards matter

  • Force the hiring manager to define 'great' before meeting candidates
  • Give every interviewer the same target, so feedback is comparable
  • Reduce affinity bias by anchoring on evidence, not impression
  • Create a defensible record under EEOC/Title VII and UK Equality Act scrutiny
  • Make debriefs about evidence, not who spoke last or loudest
Validated, not just rigorous

Structured interviewing has been validated across decades of meta-analyses. Google's hiring research (Project Oxygen / re:Work) and the US OPM's structured interview guide both anchor on the same idea: define competencies, ask consistent questions, rate against behavioral anchors.

Anatomy of a scorecard

What a complete scorecard contains
  1. 1
    Role context
    One-line mission for the role, level, and the team it sits on.
  2. 2
    Outcomes (3–5)
    What success in 12 months looks like — measurable wherever possible.
  3. 3
    Must-have competencies
    The 4–6 competencies the loop will actually assess. Not 20.
  4. 4
    Nice-to-haves
    Explicitly separated so they don't sneak into rejection rationale.
  5. 5
    Anti-signals
    Behaviors that should down-vote regardless of other strengths (e.g., punching down in a behavioral story).
  6. 6
    Loop map
    Which interviewer assesses which competency, with the question bank.
  7. 7
    Rating scale + anchors
    A defined scale with behavioral examples per level.

Defining competencies

A competency is a cluster of observable behavior tied to job performance — not a personality trait. 'Smart' is not a competency. 'Decomposes ambiguous problems into testable hypotheses' is.

Weak vs strong competency definitions
Weak (avoid)
  • Smart
  • Cultural fit
  • Good communicator
  • Self-starter
  • Passionate
Strong (use)
  • Decomposes ambiguous problems into testable hypotheses
  • Adapts message to technical vs non-technical audiences
  • Names disagreement with peers and resolves it without escalation
  • Identifies missing work and starts it without being asked
  • Asks clarifying questions before solving
'Culture fit' is a legal landmine

It is vague, varies by interviewer, and correlates with affinity bias. Replace it with 'values alignment' tied to 2–3 specific written values, each with a behavioral anchor. EEOC guidance treats subjective criteria as a higher-risk hiring signal.

Rating scales that work

A 4-point scale is the sweet spot: it forces a directional call (no neutral middle), is granular enough to distinguish candidates, and is simple enough that interviewers actually use it consistently.

A 4-point scale with behavioral anchors
RatingLabelMeaning
1Strong No HireDemonstrated the opposite of the competency, or showed an anti-signal.
2No HireDid not demonstrate the competency at the bar for this level.
3HireDemonstrated the competency at the bar with concrete evidence.
4Strong HireDemonstrated the competency well above the bar, with depth.
Avoid 5-point scales

5-point scales reliably produce a clump at '3 — Mixed', which carries no decision. Force interviewers to choose a direction; the debrief is where nuance lives.

One scorecard, many interviewers

  1. Map each competency to exactly one or two interviewers — no full overlap, no gaps
  2. Pair each interviewer with 2–3 questions per competency from a shared bank
  3. Every interviewer must submit ratings + written evidence before seeing others' scores
  4. Block scorecard visibility until submission (Greenhouse, Ashby, and Lever all support this)
  5. Debrief is the synthesis — not a re-vote

Calibration and debrief

Running a clean debrief
  1. 1
    Pre-read
    Hiring manager reads all scorecards before the room opens. Notes patterns.
  2. 2
    Per-competency walk
    For each competency: each interviewer states rating and 1–2 pieces of evidence. No 'feelings' before evidence.
  3. 3
    Disagreement protocol
    When ratings diverge by 2+ points, ask: did we hear different evidence, or interpret the same evidence differently?
  4. 4
    Decision
    Hiring manager makes the call. Recruiter records the rationale tied to the scorecard, not to vibes.
  5. 5
    Post-mortem after 6 months
    Look back at hires vs the bar set in the scorecard. Was the scorecard predictive?

Worked example

Example scorecard fragment — Senior Backend Engineer
CompetencyInterviewerBarSample question
System designTech leadDesigns a service with explicit trade-offs on consistency vs availabilityDesign a rate limiter for a public API at 1M RPS.
DecompositionPeer engineerBreaks an ambiguous problem into testable hypothesesWalk me through a time you debugged a production issue with no obvious cause.
CollaborationCross-team partnerNames disagreement, resolves without escalationTell me about a technical decision you lost. What happened next?
OwnershipHiring managerIdentifies missing work, starts it without promptingDescribe something you fixed that wasn't your job.

Common mistakes

  • Scorecards written after the loop is designed (should be the other way around)
  • Overlapping competencies across interviewers — wastes signal
  • No behavioral anchors — every interviewer invents their own bar
  • Allowing 'culture fit' as a competency
  • Reading other interviewers' scores before submitting your own
  • Debriefs that revisit 'gut feel' instead of staying on the scorecard
  • Never auditing whether the scorecard predicted on-the-job performance
Written by Pawan Joshi. Sources cited inline. Last updated 2026-05-17.