Skip to content
Playbook
IntermediateHRPeopleOpsManager

Kirkpatrick's four levels in modern L&D — what's still useful, what's been replaced

The 1959 evaluation model that still anchors enterprise L&D, the persistent measurement gap at Level 3 and 4, and the Phillips ROI addition most companies skip.

10 min read Updated 2026-05-22
On this page
60-Second Summary
  • Donald Kirkpatrick's four-level evaluation model (Reaction, Learning, Behavior, Results) is still the dominant L&D framework 65 years after publication — because no replacement has been better.
  • Most L&D programs measure Level 1 (satisfaction) and stop there. ATD research: only 14% of organizations measure Level 4 (results), and only 5% calculate Phillips' Level 5 ROI.
  • The measurement gap isn't about tooling — it's about willingness. Measuring behavior change requires honest before/after comparison and isolation from other variables.
  • Modern L&D adds two layers Kirkpatrick didn't: predictive analytics (will this person apply the learning?) and continuous-rather-than-event measurement (Behavior at 30/60/180 days, not Behavior at 'after the course').

Donald Kirkpatrick wrote his four-level framework as a UW PhD dissertation in 1954, published it in 1959, and still anchors every serious L&D evaluation conversation in 2026. That's a 65-year shelf life in a field that loves new frameworks. There's a reason.

The four levels revisited

LevelQuestionTypical measureHonest difficulty
1 — ReactionDid they like it?Post-program survey, NPSEasy and over-measured
2 — LearningDid they learn it?Pre/post test, certificationModerate — depends on assessment quality
3 — BehaviorAre they doing it differently at work?Manager observation, work-product analysis, 360 feedbackHard — requires longitudinal observation
4 — ResultsDid it move a business outcome?KPI movement attributable to the interventionVery hard — confounded by other variables

Where most programs stop

~92%
Of programs measure Level 1
ATD 2023 research
~55%
Measure Level 2
Same data
~30%
Measure Level 3
Same data
~14%
Measure Level 4
Same data

The measurement collapse between Level 2 and Level 4 is the L&D function's central credibility problem. CFOs see a $5M training spend and ask what changed; the L&D team has detailed satisfaction scores and no behavior or results data. The CFO concludes correctly that nobody knows whether the spend worked.

Phillips' ROI extension

Jack Phillips added a fifth level in the 1980s: Return on Investment, calculated as ((Benefit − Cost) ÷ Cost) × 100. It's controversial because attribution is hard — but the discipline of attempting the calculation forces L&D teams to specify what behavior change they expect and what business outcome flows from it.

When ROI calculation makes sense

Tight ROI calculation is realistic for skill-based training tied to revenue-generating behavior (sales training, customer-success methodology, technical certification). It's harder for leadership development and culture work — where the right metric is more often retention or engagement than P&L.

Modern additions

What 2020s L&D adds to Kirkpatrick
  1. 1
    Predictive analytics (Level 0)
    Before the program: who is likely to apply this learning based on role, manager, and team context? Saves spend on participants with no chance of transfer.
  2. 2
    Continuous Level 3 measurement
    Behavior measured at 30, 90, 180 days post-program — not 'right after.' Kirkpatrick's original framing was event-based; the modern version is longitudinal.
  3. 3
    Manager-as-multiplier
    Whether learning transfers is more dependent on the manager's behavior reinforcement than on program design. Track manager support as a Level 3 leading indicator.
  4. 4
    Skills-based credential capture
    Level 2 verification now flows into skills inventory for internal mobility (closes the loop with talent marketplace systems).

Frequently asked questions

Is Kirkpatrick obsolete?

No — but standalone Level 1 measurement is. The Kirkpatrick framework is still the cleanest way to think about what to measure; the obsolescence is in stopping at Level 1 and calling it evaluation.

How do we measure Level 3 without surveillance?

Manager observation at structured check-ins (30, 90, 180 days) with specific behavior anchors. Self-report with manager triangulation. 360 feedback if the program targets management or interpersonal skill. Skills demonstration for technical skills.

Should we measure ROI on every program?

No — tight ROI calculation is expensive and only worth it for major programs (>$500k spend). For smaller programs, manager-rated Level 3 behavior change at 90 days is the most honest cost-effective measure.

Further reading
Written by Pawan Joshi. Sources cited inline. Last updated 2026-05-22.