Calibration Sessions: The Quiet Engine of Fair Performance Management
Why ratings without calibration are leniency-and-stringency lotteries, and how to run a 90-minute calibration that produces decisions managers can defend to…
On this page▾
- Manager rating tendencies vary by 1–1.5 points on a 5-point scale — calibration normalizes the gap.
- Calibrate in groups of 6–10 peer managers, facilitated by HR.
- Discuss outliers, not every employee. Use the time on the disagreements.
- Force comparison: 'Why is A a 4 and B only a 3?'
- Audit calibration outcomes by protected class — bias compounds in unaudited sessions.
Industrial-organizational psychologists have measured manager rating tendencies for decades. The result is consistent: a 5-point performance scale, used by 20 untrained managers, produces a 1–1.5 point spread for identical performance. Calibration is the only known correction.
Why calibration exists
Without calibration, the rating an employee receives is partly a function of who their manager is. Lenient managers protect their teams; stringent managers punish theirs. The result is unfair compensation, unfair promotion, and lost trust in the entire performance system.
Structure of a 90-minute session
- 1Pre-read (sent 48h before)Manager-proposed ratings for all team members, with one-paragraph rationale per non-meets rating. Forced distribution NOT used as a target.
- 2Opening (10 min)Facilitator restates definitions of each rating level. Names the bias risks (recency, similarity, leniency, halo).
- 3Outlier review (60 min)Group focuses on proposed top-rating and bottom-rating cases. Force comparison across teams.
- 4Distribution check (10 min)Look at final distribution by gender, ethnicity, tenure, team. Flag anything that looks off for HR follow-up.
- 5Close (10 min)Confirm decisions, owners, timeline for manager-employee conversations. Everything stays confidential to this room.
The facilitator's playbook
- Open with the definitions — people drift in 12 months.
- Never let a manager defend their own team unchallenged. Ask peers: 'Does this rating compare cleanly to someone in your team at the same level?'
- Name biases out loud when you see them. 'I think we're recency-biased here — what did this person do in Q1?'
- Time-box every case to 7 minutes. Endless debate hides indecision.
- Capture decisions in real time on a shared screen. No 'we'll write it up later'.
Post-calibration discipline
- Within 48h: HR analyses final distribution by protected class and reports back to the group.
- Managers brief their direct reports within 2 weeks using the calibrated rating.
- Compensation decisions flow from calibrated ratings, not pre-calibration proposals.
- Notes from the room never leave the room. Confidentiality is the only reason managers will speak honestly next time.
Read next
All playbooksThe 9-box performance-vs-potential grid is the most-used and most-misused tool in talent management. The discipline that makes it useful — and the rules that…
The system around the review matters more than the review itself. A modern approach to goals, feedback, calibration, and the conversation.
How modern talent teams build succession plans that survive contact with reality — beyond the named-successor spreadsheet, into readiness, development, and…