Jan 3, 2026

Your Engineering Metrics Are Lying

Green dashboards hide failing teams. Learn the three metrics: cycle time, change failure rate, and team satisfaction, that create productive tension and are hard to game.

Reading time: 13 minutes

A few years ago, a CEO showed me his engineering metrics dashboard. Green across the board. Velocity up 40%. Deployment frequency doubled. Code coverage at 89%.

Yet his best engineers were quitting. The product hadn't shipped a meaningful feature in three months. Customers were unhappy.

The metrics were flawless. The business was bleeding.

This is a common failure mode I see when I'm brought in to fix engineering organisations. They've optimised for numbers that stopped meaning anything.

Teams hit targets that look great in quarterly reviews while actual work, shipping things that matter to users, grinds to a halt.

I've spent the past decade helping companies untangle this mess. The fix isn't complicated, but it requires discipline most organisations resist:

Define meaningful goals; then choose metrics that serve them.
Track trends, not just targets.
Kill metrics that stop driving real progress.
Start with a baseline triangle: cycle time, change failure rate, team satisfaction.

Goals

Before discussing metrics, we need to talk about goals. A common mistake organisations make is to pick a goal, attach a metric, and then only communicate the metric. This causes people to treat the metric as the goal. Whenever you talk about metrics, it should be in the context of the underlying goal.

I joined a fintech scale-up that pressured the development team to reduce the post-release defect rate. To hit the target, they introduced mandatory code reviews for every commit, added extensive pre-merge automated tests, implemented a "zero-bug" policy that blocked releases until all known defects were fixed, and began rejecting simpler features that historically introduced bugs.

The defect rate dropped, earning the team praise. However, the real driver of bugs, years of technical debt from rushed deadlines, duplicated code, and brittle architecture, was never addressed.

Adding new features became excruciatingly slow, the codebase grew even more fragile under the weight of patches and workarounds, and developer burnout soared. Ultimately, this stalled product progress despite glowing metrics.

When I joined, the real turnaround came when we started tracking cycle time, code churn, and rework rate, expanding our measurements. This refocused the team to address the root cause: technical debt.

In this example, achieving the real goal, delivering high-quality software to users, failed due to an overly zealous focus on only one particular metric. This is the best way to optimise your team into oblivion while wasting limited resources.

When driving change, focus on goals, not metrics. Metrics are just a means to an end, and you should adapt what you measure if it no longer helps meet your goal.

KPIs and OKRs

Most organisations misuse OKRs. They pick a metric, call it an "Objective," and wonder why nothing changes.

A good OKR looks like this:

Deliver unbreakable reliability to ensure seamless user experiences

Achieve 99.9% uptime for core services
Reduce average page load time by 40%
Lower incident response time to under 15 minutes for P1 issues

Most OKRs I see don't look like this. They look like this:

Achieve 35 story points per sprint

Increase average sprint velocity to 35 story points
Reduce cycle time to under 10 days
Complete 100% of committed stories in every sprint

That's a terrible OKR.

The "Objective" is literally a number pulled from a common metric. It's the KR disguised as an Objective. The last metric is probably unrealistic. The whole thing is a grouped set of velocity metrics with no broader vision. It screams "we looked at our dashboards first and wrote OKRs around them."

I've seen this exact pattern destroy a B2B SaaS team. Leadership mandated 100% sprint commitment completion. Within two quarters, engineers started gaming the metrics. Pad estimates, avoid ambitious work, split everything into trivial tickets. Velocity looked great. Actual output collapsed. Two senior developers who refused to play along left. The team never recovered.

This will set teams up for burnout and zero innovation.

Start with your objectives. Then, pick 2-4 key results to achieve them.

Monitor trends

Organisations tend to simply pick a number and monitor the metric to see if it's reached, watching the absolute numbers. Watching a number is easy. Figuring out if things are moving in the right direction and whether that number is still relevant is harder.

Instead of monitoring absolute numbers, it's usually more beneficial to monitor trends, especially for qualitative and interpretative concepts like productivity, quality, and stability. Any number here is arbitrary, so you want to make sure it remains relevant as things progress.

A good example is test coverage. You can set a target of 93% coverage, but that may be daunting for the team if that's all you monitor. Instead, monitor the increase in test coverage over time (the trend). This will indicate early if you're on track, if your goal is feasible, and if adjustments are needed.

It will tell you when a metric stops being useful. You might reach 91% test coverage, and the last 2% might be an impossible task for various reasons, yet have no impact on quality. That's when you change your target and/or metric.

Once a metric stops being a useful success indicator, kill it before it hinders progress. Just because it was useful in the past doesn't mean it is now. Chasing stale metrics breeds waste.

Target ranges

Another mistake organisations make is treating the target as a singular number. Anything less equals failure, which demotivates in dynamic, innovative environments. It ignores the unpredictability of real-world development.

Instead, think of each target as a landing strip. There's an optimal place to land at the beginning of the runway to make optimal use of the space. However, as long as you land within a certain range, you still meet your goal of landing the airplane safely.

Instead of one target number, you want to define three thresholds to evaluate target achievement:

Pass: This number represents the absolute minimum to meet the target. Anything below is a fail.
Good: This number represents the expected performance or target level.
Excellent: This number represents outstanding performance or exceeding your target.

Using the test coverage example, if we set these numbers to 60%, 70%, and 85%, we get the following four ranges:

Using ranges instead of a single target accounts for real-world uncertainty, making goals more realistic and adaptable. It boosts motivation by avoiding the demotivation of narrowly missing a fixed number or stopping effort once a target is met, while encouraging stretch toward excellence.

Start with the triangle (what to track first)

Every organisation has different challenges and OKRs to define success.

You can track a wide range of metrics. The important thing here is to think about your problem or improvement areas first, come up with solid objectives, and only then pick your metrics that suit that objective.

If you're not measuring anything right now and want to establish a baseline before choosing your key results, prioritise simple, low-overhead metrics that leverage existing tools. Focus on automation where possible, and quarterly anonymous surveys for people metrics.

If you need a starting baseline and are measuring nothing, I recommend three metrics that form a triangle. They create natural tension with each other, making them hard to game.

Cycle time

What it is: The elapsed time from when work starts on a task to when it's deployed to production.
Why it matters: Cycle time reveals flow. If work takes weeks to ship, everything else is downstream of that problem. Long cycle times compound: they hide feedback loops, increase context-switching, and demoralise teams who never see their work reach users.
How to measure it: Pull it from your ticketing system. Most tools (Jira, Linear, Shortcut) can report this automatically. Measure from "in progress" to "deployed," not from ticket creation. You want to see how long work takes once someone picks it up.
What to watch for: Cycle time can be gamed by splitting work into artificially small tickets. If your average cycle time drops but throughput of meaningful features doesn't increase, you're probably seeing ticket inflation. Also watch the distribution, not just the average. A 5-day average with a long tail of 30-day outliers tells a different story than consistent 5-day delivery.

I worked with an e-commerce platform that had a 3-day average cycle time. Impressive on paper. But when we looked at the distribution, we found that 40% of tickets took over two weeks. Those were the complex, high-value features. The fast average masked that anything difficult got stuck in review queues for days. Once we focused on the outliers instead of the average, we found the real bottleneck: a single senior engineer who had to approve all changes.

Change failure rate

What it is: The percentage of deployments resulting in an incident, rollback, or hotfix.
Why it matters: Change failure rate reveals quality. It balances the speed incentive. Fast but broken is worse than slow. High failure rates erode trust: teams fear deploying, lengthening cycle time, increasing batch sizes, and risk. It's a death spiral.
How to measure it: Count deployments needing a rollback, causing an incident, or needing an immediate follow-up fix. Divide by total deployments. Pull this from your CI/CD pipeline and incident tracking system.
What to watch for: Sometimes teams game this by deploying less frequently and batching more changes together. The failure rate looks better, but each failure is now catastrophic. Track deployment frequency alongside change failure rate to catch this pattern. Also be clear about what counts as a "failure." If the definition is too loose, everything becomes a failure; too strict, and you miss real problems.

Team satisfaction

What it is: A periodic measure of team sentiment about their work, typically via eNPS (employee Net Promoter Score) or a short survey.
Why it matters: You can hit excellent delivery numbers while burning people out. Satisfaction catches the hidden cost. It's a leading indicator: declining satisfaction predicts attrition and quality problems months before they appear in other metrics.
How to measure it: Quarterly anonymous surveys. Keep them short, with a maximum of five questions. eNPS works ("How likely are you to recommend this team as a place to work?"), but I prefer adding specific questions about workload sustainability, clarity of goals, and confidence in technical direction for actionable insights.
What to watch for: If you over-survey, survey fatigue will kill response rates. Quarterly is enough. Watch for teams with high satisfaction but poor delivery metrics. Sometimes teams are comfortable because they've stopped being challenged. Satisfaction without accountability isn't health; it's stagnation.

The triangle

These metrics create productive tension. If you push cycle time too hard without fixing root causes, change failure rate climbs. If you optimise for quality too aggressively that you slow down, team satisfaction tanks from lack of progress. If you chase satisfaction scores by reducing workload, delivery suffers.

I once joined an organisation fixated on "quality" and consensus. Every change needed three reviews, architecture council sign-off, and an exhaustive test plan. Change failure rate was stellar and eNPS rose. No one felt rushed. But cycle time doubled, deployment frequency fell, and meaningful features took months to ship. Meetings replaced decisions; engineers optimised for passing gates, not delivering value. Customers churned while dashboards glowed. Quality and harmony without throughput is stasis. The triangle would have caught it.

Warning signs your metrics are lying

Patterns I look for when the numbers seem "good" but reality isn't:

What you see	How to fix
Velocity is up. Shipped outcomes are flat.	You're inflating tickets. Track throughput of user-facing outcomes (features, experiments) and story-size distribution.
Cycle time is down. Deployment frequency is flat.	Work finishes "on paper" but waits to go live. Measure time-in-state (coding, review, deploy queue) and decouple deploy from release.
Change failure rate down. Incident impact up.	Fewer but bigger explosions. Pair CFR with deployment frequency and MTTR; prefer small, frequent changes.
Code coverage is up. Bug rate unchanged.	You're testing the easy parts. Measure coverage on critical paths or use mutation testing; sample production error classes.
100% sprint completion. Zero spillover.	Sandbagging. Track planned vs unplanned work and % time on roadmap vs interrupts.
eNPS is up. Delivery is down.	Comfort without accountability. Tie satisfaction targets to outcome OKRs; ask "What shipped that mattered?" each review.
PRs "approved" fast. High PR age.	Rubber-stamping after long waits. Track review wait time separately from active review time; set SLAs for first-response.
Few critical incidents. Long MTTR.	You're blind, not stable. Invest in observability and SLOs; tune alert thresholds to user impact.
All engineering dashboards green. Product KPIs red.	Local optimisation. Add product metrics (activation, retention, WAU/MAU, NPS) to the same review.
Many "architecture meetings." Few releases.	Consensus theater. Cap approval layers, timebox decisions, and empower responsible individuals.

Rule of thumb: if two triangle points improve while the third decays, you're not optimizing. You're hiding the cost. Surface it, then fix the constraint.

Next steps

If your dashboards are green but shipping has stalled, engineers are leaving, or customers are churning, you don't have a "people" problem. You have a measurement problem.

Do this in two weeks:

Instrument the triangle: cycle time, change failure rate, team satisfaction.
Set target ranges (pass/good/excellent) and review trends weekly.
Eliminate one stale metric that isn't driving behaviour.

If that sounds hard, I run a 2-week diagnostic to expose bottlenecks and reset OKRs, followed by a 3-6 month turnaround as interim CTO/VP Engineering or advisor. I've done this before. Let's stop bleeding and start shipping. How I can help.

If you want to go deeper later, look at DORA metrics and the SPACE framework. But start with the triangle.