Engineering Performance Signals: DORA, SPACE, and DX — What to Measure and What to Ignore

60-Second Summary

DORA measures delivery performance (4 metrics). It is a team-level signal, not an individual one.
SPACE (Forsgren, Storey, Maddila, Zimmermann, Houck) is a 5-dimension framework that resists single-metric optimisation.
DX (Developer Experience) blends quantitative DORA-style data with qualitative survey data — closer to HR engagement than to DORA.
All three break the moment they're used to rank or pay individuals. Goodhart's Law applies.
Pair every quantitative metric with a qualitative one. Cycle time + 'how confident are you in this change?' is more honest than either alone.

Every CFO eventually asks the CTO: 'How do we know engineering is productive?' The honest answer is uncomfortable: there is no single number, and any single number you optimise for will be gamed within two quarters. There are, however, three serious frameworks built by researchers who have studied this problem for two decades. This is the leader's guide to DORA, SPACE, and DX — what each measures, what each misses, and how to use them without breaking the team you're trying to measure.

Why engineering measurement is different

Software engineering output is non-linear, non-additive, and frequently invisible. A senior engineer who removes 4,000 lines of code and prevents a future incident produced more value than a junior who shipped 12 features. Lines of code, story points, and commits are universally discredited as productivity metrics — and yet they are still the default in many organisations because they are easy to count. The frameworks below exist because the easy metrics actively cause harm.

Goodhart's Law

Charles Goodhart's principle, popularised in management as 'When a measure becomes a target, it ceases to be a good measure', applies more strongly to engineering than almost any other discipline. Targeting commit count produces commit-padding; targeting velocity produces inflated point estimates; targeting code coverage produces meaningless tests. Plan for this.

DORA — the four keys

Originating from the DevOps Research and Assessment program (Nicole Forsgren, Jez Humble, Gene Kim — published as Accelerate, 2018) and now run by Google Cloud, DORA distils a decade of research into four delivery metrics that statistically separate elite-performing organisations from low performers.

The four DORA metrics

1
Deployment frequency
How often code is deployed to production. Elite teams deploy on demand (multiple times per day). Low performers deploy less than monthly.
2
Lead time for changes
Time from code commit to code running in production. Elite: less than 1 hour. Low: more than 1 month.
3
Change failure rate
% of deployments that cause a degradation in service. Elite: 0–15%. Low: 46–60%.
4
Time to restore service
How long it takes to recover from a failure. Elite: less than 1 hour. Low: more than 1 month.

Throughput and stability together

The genius of DORA is that the four metrics are paired. Deployment frequency and lead time measure throughput. Change failure rate and time to restore measure stability. Optimising throughput alone destroys stability; optimising stability alone freezes delivery. Elite teams move both at once — which the research showed was achievable, against the prior industry assumption that the two were a trade-off.

DORA is a team-level signal. It tells you how the delivery system is performing, not how any individual is performing. Using DORA to compare engineers is a category error.

SPACE — the five dimensions

Published by Forsgren, Storey, Maddila, Zimmermann, and Houck (ACM Queue, 2021), the SPACE framework was an explicit reaction to organisations using DORA as a single-axis productivity stick. The authors argue that developer productivity is irreducibly multi-dimensional, and that any honest measurement program must capture multiple dimensions and pair quantitative with qualitative signals.

SPACE — the five dimensions and example metrics

1
S — Satisfaction & wellbeing
Survey-based. Burnout risk, satisfaction with tools, sense of purpose. Without this dimension, the others are misleading.
2
P — Performance
Outcome of the work. Reliability, quality, customer outcomes — not output volume.
3
A — Activity
Volume of actions: commits, PRs, deploys, code reviews. Useful as context, dangerous as a target.
4
C — Communication & collaboration
Quality of working relationships. PR review turnaround, knowledge sharing, cross-team coordination cost.
5
E — Efficiency & flow
Ability to do work with minimal interruption. Uninterrupted focus time, handoff count, wait times in the system.

The SPACE rule of three

Never measure just one dimension. The authors recommend at least three dimensions, across at least two levels (individual, team, system), with at least one being qualitative. A team scoring well on Activity and Performance but poorly on Satisfaction is on the path to attrition — and you would never see it with DORA alone.

DX — Developer Experience as a measurement frame

DX is the youngest of the three frames and the closest to HR engagement work. Developer Experience asks: how does it feel to do engineering work here? It blends DORA-style throughput data with structured developer surveys (typically quarterly), measuring friction points along the inner loop (code, test, build, debug) and outer loop (review, deploy, monitor).

The DX measurement loop

1
Quantitative baseline
DORA metrics plus inner-loop signals: local build time, test suite time, CI duration, PR review wait time, deployment frequency by team.
2
Quarterly developer survey
Structured Likert-scale questions on: confidence in the codebase, ease of making changes, quality of dev tooling, perceived support from leadership. Run every quarter; trend over time.
3
Targeted qualitative interviews
Each quarter, interview 5–8 engineers across seniority and team to find the friction points the survey alone misses.
4
Friction backlog
Treat developer-experience pain points as a real backlog. Owner, priority, due date — the same discipline as customer-facing work.

How the three frameworks fit together

When to reach for each framework

Use DORA when…

The exec team wants a delivery scorecard
You're benchmarking delivery maturity against industry
You're justifying a platform investment
You can capture all four metrics reliably from CI/CD

Use SPACE or DX when…

You're measuring whether the team is sustainable
You're investing in developer experience or platform tooling
You're seeing throughput numbers improve but attrition rising
You need to defend the team against single-metric pressure

In practice, mature engineering organisations run DORA at the delivery layer, SPACE at the team-health layer, and DX at the tooling and platform layer. They are complementary, not competing.

HR engagement scores vs. engineering signals

Standard HR engagement surveys (Gallup Q12, Glint, Culture Amp generic templates) tend to under-measure engineering reality because they were built for general knowledge work. They do not ask about CI build times, on-call load, code review wait time, or whether the engineer can ship a small change in under a day. The result is engineering teams scoring as 'engaged' on the HR dashboard while quietly being miserable.

Where engagement surveys miss engineering reality

What HR measures	What engineers experience	Add this question
'I have the tools to do my job'	Build is 8 minutes, CI is 22 minutes, repro takes 30 minutes	'How long does it take you to validate a small change end-to-end?'
'My workload is manageable'	Carries 40% of the on-call pages	'How often are you paged outside working hours in a typical week?'
'I get useful feedback'	PRs sit in review for 3 days	'What is your typical PR review wait time?'
'I have growth opportunities'	Hasn't shipped a substantial change in a quarter	'When did you last own a meaningful piece of work end-to-end?'

Common failure modes when measuring engineers

Stack-ranking individuals by DORA metrics. Destroys collaboration and rewards risk-aversion.
Treating SPACE's Activity dimension as productivity. The authors specifically warn against this in the original paper.
Running developer surveys without acting on the results. Engagement collapses faster than if you had never asked.
Comparing teams to each other across very different problem domains. A platform team and a product team have different baselines.
Letting finance build the metrics. Engineering measurement needs to be co-designed with the people being measured.
Measuring everything and deciding nothing. Pick 5–7 metrics, review them monthly, change the system based on them.

A pragmatic starter measurement stack

The seven-metric starter stack for a 50-engineer org

1
1. Deployment frequency (team)
From CI/CD. Trend monthly.
2
2. Lead time for changes (team)
Commit-to-prod. Trend monthly.
3
3. Change failure rate (team)
% of deploys causing incidents. Trend quarterly.
4
4. PR review wait time (team)
Median hours from PR open to first review. Surface bottlenecks.
5
5. On-call pages per engineer per shift
From PagerDuty/Opsgenie. Trend monthly. Triggers redesign if >2.
6
6. Quarterly developer satisfaction (org)
10-question survey covering tooling, focus time, on-call, growth. Anonymous.
7
7. Voluntary attrition by tenure & team
From HR. Reviewed alongside the above, never in isolation.

Where to read further

References

DORA — State of DevOps Reports — Google Cloud / DORA
Forsgren, Storey, Maddila, Zimmermann, Houck — The SPACE of Developer Productivity (ACM Queue, 2021) — ACM
Accelerate — Forsgren, Humble, Kim (2018) — IT Revolution
DX — Developer Experience research — DX

Written by Pawan Joshi.Sources cited inline.

First published 7 Mar 2026See site changelog →