Interview Scorecards: Writing Them Well

60-Second Summary

A scorecard turns 'did you like them?' into 'did they meet the bar on X?'
Build before you source. Editing scorecards mid-loop is how bias enters.
5–7 dimensions, each with concrete evidence. More dimensions = less signal.
Written debriefs anchored on the scorecard prevent the loudest voice from winning.

Structured interviews are roughly twice as predictive of job performance as unstructured ones (Schmidt & Hunter, 1998; updated in Sackett et al., 2022). The scorecard is the artifact that makes structure real. Without one, your hiring loop is a vibes-based committee — and the legal exposure is the cherry on top.

Why scorecards matter

Force the hiring manager to define 'great' before meeting candidates
Give every interviewer the same target, so feedback is comparable
Reduce affinity bias by anchoring on evidence, not impression
Create a defensible record under EEOC/Title VII and UK Equality Act scrutiny
Make debriefs about evidence, not who spoke last or loudest

Validated, not just rigorous

Structured interviewing has been validated across decades of meta-analyses. Google's hiring research (Project Oxygen / re:Work) and the US OPM's structured interview guide both anchor on the same idea: define competencies, ask consistent questions, rate against behavioral anchors.

Anatomy of a scorecard

What a complete scorecard contains

1
Role context
One-line mission for the role, level, and the team it sits on.
2
Outcomes (3–5)
What success in 12 months looks like — measurable wherever possible.
3
Must-have competencies
The 4–6 competencies the loop will actually assess. Not 20.
4
Nice-to-haves
Explicitly separated so they don't sneak into rejection rationale.
5
Anti-signals
Behaviors that should down-vote regardless of other strengths (e.g., punching down in a behavioral story).
6
Loop map
Which interviewer assesses which competency, with the question bank.
7
Rating scale + anchors
A defined scale with behavioral examples per level.

Defining competencies

A competency is a cluster of observable behavior tied to job performance — not a personality trait. 'Smart' is not a competency. 'Decomposes ambiguous problems into testable hypotheses' is.

Weak vs strong competency definitions

Weak (avoid)

Smart
Cultural fit
Good communicator
Self-starter
Passionate

Strong (use)

Decomposes ambiguous problems into testable hypotheses
Adapts message to technical vs non-technical audiences
Names disagreement with peers and resolves it without escalation
Identifies missing work and starts it without being asked
Asks clarifying questions before solving

'Culture fit' is a legal landmine

It is vague, varies by interviewer, and correlates with affinity bias. Replace it with 'values alignment' tied to 2–3 specific written values, each with a behavioral anchor. EEOC guidance treats subjective criteria as a higher-risk hiring signal.

Rating scales that work

A 4-point scale is the sweet spot: it forces a directional call (no neutral middle), is granular enough to distinguish candidates, and is simple enough that interviewers actually use it consistently.

A 4-point scale with behavioral anchors

Rating	Label	Meaning
1	Strong No Hire	Demonstrated the opposite of the competency, or showed an anti-signal.
2	No Hire	Did not demonstrate the competency at the bar for this level.
3	Hire	Demonstrated the competency at the bar with concrete evidence.
4	Strong Hire	Demonstrated the competency well above the bar, with depth.

Avoid 5-point scales

5-point scales reliably produce a clump at '3 — Mixed', which carries no decision. Force interviewers to choose a direction; the debrief is where nuance lives.

One scorecard, many interviewers

Map each competency to exactly one or two interviewers — no full overlap, no gaps
Pair each interviewer with 2–3 questions per competency from a shared bank
Every interviewer must submit ratings + written evidence before seeing others' scores
Block scorecard visibility until submission (Greenhouse, Ashby, and Lever all support this)
Debrief is the synthesis — not a re-vote

Calibration and debrief

Running a clean debrief

1
Pre-read
Hiring manager reads all scorecards before the room opens. Notes patterns.
2
Per-competency walk
For each competency: each interviewer states rating and 1–2 pieces of evidence. No 'feelings' before evidence.
3
Disagreement protocol
When ratings diverge by 2+ points, ask: did we hear different evidence, or interpret the same evidence differently?
4
Decision
Hiring manager makes the call. Recruiter records the rationale tied to the scorecard, not to vibes.
5
Post-mortem after 6 months
Look back at hires vs the bar set in the scorecard. Was the scorecard predictive?

Worked example

Example scorecard fragment — Senior Backend Engineer

Competency	Interviewer	Bar	Sample question
System design	Tech lead	Designs a service with explicit trade-offs on consistency vs availability	Design a rate limiter for a public API at 1M RPS.
Decomposition	Peer engineer	Breaks an ambiguous problem into testable hypotheses	Walk me through a time you debugged a production issue with no obvious cause.
Collaboration	Cross-team partner	Names disagreement, resolves without escalation	Tell me about a technical decision you lost. What happened next?
Ownership	Hiring manager	Identifies missing work, starts it without prompting	Describe something you fixed that wasn't your job.

Common mistakes

Scorecards written after the loop is designed (should be the other way around)
Overlapping competencies across interviewers — wastes signal
No behavioral anchors — every interviewer invents their own bar
Allowing 'culture fit' as a competency
Reading other interviewers' scores before submitting your own
Debriefs that revisit 'gut feel' instead of staying on the scorecard
Never auditing whether the scorecard predicted on-the-job performance

References

Schmidt & Hunter (1998) + Sackett et al. (2022) — Selection validity meta-analyses — APA / Journal of Applied Psychology
Google re:Work — Structured interviewing — re:Work
US OPM — Structured Interviews Guide — OPM
EEOC — Employer best practices — EEOC
Greenhouse — Scorecards in practice — Greenhouse
Ashby — Interview kits and scorecards — Ashby

From the Insights desk

Longer-form essays that extend the ideas in this playbook with research, data, and 2026 context.

Written by Pawan Joshi.Sources cited inline.

First published 18 Nov 2025See site changelog →