Engineering Performance Signals: DORA, SPACE, and DX — What to Measure and What to Ignore
Engineering performance is not productivity, and productivity is not lines of code. A leader's guide to the three modern measurement frameworks — DORA for…
On this page▾
- DORA measures delivery performance (4 metrics). It is a team-level signal, not an individual one.
- SPACE (Forsgren, Storey, Maddila, Zimmermann, Houck) is a 5-dimension framework that resists single-metric optimisation.
- DX (Developer Experience) blends quantitative DORA-style data with qualitative survey data — closer to HR engagement than to DORA.
- All three break the moment they're used to rank or pay individuals. Goodhart's Law applies.
- Pair every quantitative metric with a qualitative one. Cycle time + 'how confident are you in this change?' is more honest than either alone.
Every CFO eventually asks the CTO: 'How do we know engineering is productive?' The honest answer is uncomfortable: there is no single number, and any single number you optimise for will be gamed within two quarters. There are, however, three serious frameworks built by researchers who have studied this problem for two decades. This is the leader's guide to DORA, SPACE, and DX — what each measures, what each misses, and how to use them without breaking the team you're trying to measure.
Why engineering measurement is different
Software engineering output is non-linear, non-additive, and frequently invisible. A senior engineer who removes 4,000 lines of code and prevents a future incident produced more value than a junior who shipped 12 features. Lines of code, story points, and commits are universally discredited as productivity metrics — and yet they are still the default in many organisations because they are easy to count. The frameworks below exist because the easy metrics actively cause harm.
Charles Goodhart's principle, popularised in management as 'When a measure becomes a target, it ceases to be a good measure', applies more strongly to engineering than almost any other discipline. Targeting commit count produces commit-padding; targeting velocity produces inflated point estimates; targeting code coverage produces meaningless tests. Plan for this.
DORA — the four keys
Originating from the DevOps Research and Assessment program (Nicole Forsgren, Jez Humble, Gene Kim — published as Accelerate, 2018) and now run by Google Cloud, DORA distils a decade of research into four delivery metrics that statistically separate elite-performing organisations from low performers.
- 1Deployment frequencyHow often code is deployed to production. Elite teams deploy on demand (multiple times per day). Low performers deploy less than monthly.
- 2Lead time for changesTime from code commit to code running in production. Elite: less than 1 hour. Low: more than 1 month.
- 3Change failure rate% of deployments that cause a degradation in service. Elite: 0–15%. Low: 46–60%.
- 4Time to restore serviceHow long it takes to recover from a failure. Elite: less than 1 hour. Low: more than 1 month.
The genius of DORA is that the four metrics are paired. Deployment frequency and lead time measure throughput. Change failure rate and time to restore measure stability. Optimising throughput alone destroys stability; optimising stability alone freezes delivery. Elite teams move both at once — which the research showed was achievable, against the prior industry assumption that the two were a trade-off.
DORA is a team-level signal. It tells you how the delivery system is performing, not how any individual is performing. Using DORA to compare engineers is a category error.
SPACE — the five dimensions
Published by Forsgren, Storey, Maddila, Zimmermann, and Houck (ACM Queue, 2021), the SPACE framework was an explicit reaction to organisations using DORA as a single-axis productivity stick. The authors argue that developer productivity is irreducibly multi-dimensional, and that any honest measurement program must capture multiple dimensions and pair quantitative with qualitative signals.
- 1S — Satisfaction & wellbeingSurvey-based. Burnout risk, satisfaction with tools, sense of purpose. Without this dimension, the others are misleading.
- 2P — PerformanceOutcome of the work. Reliability, quality, customer outcomes — not output volume.
- 3A — ActivityVolume of actions: commits, PRs, deploys, code reviews. Useful as context, dangerous as a target.
- 4C — Communication & collaborationQuality of working relationships. PR review turnaround, knowledge sharing, cross-team coordination cost.
- 5E — Efficiency & flowAbility to do work with minimal interruption. Uninterrupted focus time, handoff count, wait times in the system.
Never measure just one dimension. The authors recommend at least three dimensions, across at least two levels (individual, team, system), with at least one being qualitative. A team scoring well on Activity and Performance but poorly on Satisfaction is on the path to attrition — and you would never see it with DORA alone.
DX — Developer Experience as a measurement frame
DX is the youngest of the three frames and the closest to HR engagement work. Developer Experience asks: how does it feel to do engineering work here? It blends DORA-style throughput data with structured developer surveys (typically quarterly), measuring friction points along the inner loop (code, test, build, debug) and outer loop (review, deploy, monitor).
- 1Quantitative baselineDORA metrics plus inner-loop signals: local build time, test suite time, CI duration, PR review wait time, deployment frequency by team.
- 2Quarterly developer surveyStructured Likert-scale questions on: confidence in the codebase, ease of making changes, quality of dev tooling, perceived support from leadership. Run every quarter; trend over time.
- 3Targeted qualitative interviewsEach quarter, interview 5–8 engineers across seniority and team to find the friction points the survey alone misses.
- 4Friction backlogTreat developer-experience pain points as a real backlog. Owner, priority, due date — the same discipline as customer-facing work.
How the three frameworks fit together
- The exec team wants a delivery scorecard
- You're benchmarking delivery maturity against industry
- You're justifying a platform investment
- You can capture all four metrics reliably from CI/CD
- You're measuring whether the team is sustainable
- You're investing in developer experience or platform tooling
- You're seeing throughput numbers improve but attrition rising
- You need to defend the team against single-metric pressure
In practice, mature engineering organisations run DORA at the delivery layer, SPACE at the team-health layer, and DX at the tooling and platform layer. They are complementary, not competing.
HR engagement scores vs. engineering signals
Standard HR engagement surveys (Gallup Q12, Glint, Culture Amp generic templates) tend to under-measure engineering reality because they were built for general knowledge work. They do not ask about CI build times, on-call load, code review wait time, or whether the engineer can ship a small change in under a day. The result is engineering teams scoring as 'engaged' on the HR dashboard while quietly being miserable.
| What HR measures | What engineers experience | Add this question |
|---|---|---|
| 'I have the tools to do my job' | Build is 8 minutes, CI is 22 minutes, repro takes 30 minutes | 'How long does it take you to validate a small change end-to-end?' |
| 'My workload is manageable' | Carries 40% of the on-call pages | 'How often are you paged outside working hours in a typical week?' |
| 'I get useful feedback' | PRs sit in review for 3 days | 'What is your typical PR review wait time?' |
| 'I have growth opportunities' | Hasn't shipped a substantial change in a quarter | 'When did you last own a meaningful piece of work end-to-end?' |
Common failure modes when measuring engineers
- Stack-ranking individuals by DORA metrics. Destroys collaboration and rewards risk-aversion.
- Treating SPACE's Activity dimension as productivity. The authors specifically warn against this in the original paper.
- Running developer surveys without acting on the results. Engagement collapses faster than if you had never asked.
- Comparing teams to each other across very different problem domains. A platform team and a product team have different baselines.
- Letting finance build the metrics. Engineering measurement needs to be co-designed with the people being measured.
- Measuring everything and deciding nothing. Pick 5–7 metrics, review them monthly, change the system based on them.
A pragmatic starter measurement stack
- 11. Deployment frequency (team)From CI/CD. Trend monthly.
- 22. Lead time for changes (team)Commit-to-prod. Trend monthly.
- 33. Change failure rate (team)% of deploys causing incidents. Trend quarterly.
- 44. PR review wait time (team)Median hours from PR open to first review. Surface bottlenecks.
- 55. On-call pages per engineer per shiftFrom PagerDuty/Opsgenie. Trend monthly. Triggers redesign if >2.
- 66. Quarterly developer satisfaction (org)10-question survey covering tooling, focus time, on-call, growth. Anonymous.
- 77. Voluntary attrition by tenure & teamFrom HR. Reviewed alongside the above, never in isolation.
Where to read further
Read next
All playbooksOn-call rotations, incident command, and blameless post-mortems are the load-bearing rituals of a reliability culture.
Remote and hybrid are not policies — they are operating models. A practitioner's guide to designing async-first workflows, choosing the right synchronous…
The honest field manual for engineers stepping into leadership — first-time tech leads, engineering managers, CTOs, and founder-CEOs.