Kirkpatrick's four levels in modern L&D — what's still useful, what's been replaced

60-Second Summary

Donald Kirkpatrick's four-level evaluation model (Reaction, Learning, Behavior, Results) is still the dominant L&D framework 65 years after publication — because no replacement has been better.
Most L&D programs measure Level 1 (satisfaction) and stop there. ATD research: only 14% of organizations measure Level 4 (results), and only 5% calculate Phillips' Level 5 ROI.
The measurement gap isn't about tooling — it's about willingness. Measuring behavior change requires honest before/after comparison and isolation from other variables.
Modern L&D adds two layers Kirkpatrick didn't: predictive analytics (will this person apply the learning?) and continuous-rather-than-event measurement (Behavior at 30/60/180 days, not Behavior at 'after the course').

Donald Kirkpatrick wrote his four-level framework as a UW PhD dissertation in 1954, published it in 1959, and still anchors every serious L&D evaluation conversation in 2026. That's a 65-year shelf life in a field that loves new frameworks. There's a reason.

The four levels revisited

Level	Question	Typical measure	Honest difficulty
1 — Reaction	Did they like it?	Post-program survey, NPS	Easy and over-measured
2 — Learning	Did they learn it?	Pre/post test, certification	Moderate — depends on assessment quality
3 — Behavior	Are they doing it differently at work?	Manager observation, work-product analysis, 360 feedback	Hard — requires longitudinal observation
4 — Results	Did it move a business outcome?	KPI movement attributable to the intervention	Very hard — confounded by other variables

Where most programs stop

~92%

Of programs measure Level 1

ATD 2023 research

~55%

Measure Level 2

Same data

~30%

Measure Level 3

Same data

~14%

Measure Level 4

Same data

The measurement collapse between Level 2 and Level 4 is the L&D function's central credibility problem. CFOs see a $5M training spend and ask what changed; the L&D team has detailed satisfaction scores and no behavior or results data. The CFO concludes correctly that nobody knows whether the spend worked.

Phillips' ROI extension

Jack Phillips added a fifth level in the 1980s: Return on Investment, calculated as ((Benefit − Cost) ÷ Cost) × 100. It's controversial because attribution is hard — but the discipline of attempting the calculation forces L&D teams to specify what behavior change they expect and what business outcome flows from it.

When ROI calculation makes sense

Tight ROI calculation is realistic for skill-based training tied to revenue-generating behavior (sales training, customer-success methodology, technical certification). It's harder for leadership development and culture work — where the right metric is more often retention or engagement than P&L.

Modern additions

What 2020s L&D adds to Kirkpatrick

1
Predictive analytics (Level 0)
Before the program: who is likely to apply this learning based on role, manager, and team context? Saves spend on participants with no chance of transfer.
2
Continuous Level 3 measurement
Behavior measured at 30, 90, 180 days post-program — not 'right after.' Kirkpatrick's original framing was event-based; the modern version is longitudinal.
3
Manager-as-multiplier
Whether learning transfers is more dependent on the manager's behavior reinforcement than on program design. Track manager support as a Level 3 leading indicator.
4
Skills-based credential capture
Level 2 verification now flows into skills inventory for internal mobility (closes the loop with talent marketplace systems).

Frequently asked questions

Is Kirkpatrick obsolete?

No — but standalone Level 1 measurement is. The Kirkpatrick framework is still the cleanest way to think about what to measure; the obsolescence is in stopping at Level 1 and calling it evaluation.

How do we measure Level 3 without surveillance?

Manager observation at structured check-ins (30, 90, 180 days) with specific behavior anchors. Self-report with manager triangulation. 360 feedback if the program targets management or interpersonal skill. Skills demonstration for technical skills.

Should we measure ROI on every program?

No — tight ROI calculation is expensive and only worth it for major programs (>$500k spend). For smaller programs, manager-rated Level 3 behavior change at 90 days is the most honest cost-effective measure.

The four levels revisited

Where most programs stop

Phillips' ROI extension

Modern additions

Frequently asked questions

Read next