Getty Images

Google's DORA DevOps report warns against metrics misuse

This year's DORA DevOps report echoes the experiences of one organization that has applied them in practice: DORA metrics can be powerful but aren't always an exact science.

Google's DevOps Research and Assessment team trades in metrics -- DevOps metrics, that is. But its latest State of DevOps report also warns against overusing them.

The "Accelerate State of DevOps Report 2023" surveyed 36,000 IT pros about their experiences implementing DevOps and its effects on organizational performance.

The DevOps Research and Assessment group (DORA) suggests IT pros evaluate team performance according to four key metrics: deployment frequency; lead time for changes; change failure rate; and failed deployment recovery time, previously mean time to restore service (MTTR). Since Google acquired DORA in 2018, startups and established software development vendors including Sleuth, Harness and Atlassian have reported on DORA metrics for engineering managers.

While DORA still takes stock of organizational performance using metrics, this year's report also warns of common pitfalls when DORA metrics are viewed as an exact science or used improperly to evaluate team-by-team performance.

"Measurement is not the goal, just as delivering software is not the goal," according to the report, which is now in its ninth year. "Fixating on performance metrics can lead to ineffective behaviors. Investing in capabilities and learning is a better way to enable success. Teams that learn the most improve the most."

Organizational pitfalls: the perils of metrics misuse

It's natural for engineering managers and executives to use DORA metric results as goals and to compare the performance of different development teams, but this is a mistake, said Nathen Harvey, developer advocate at DORA and Google Cloud.

Nathen Harvey, developer advocate, DORA and Google CloudNathen Harvey

"What I really want leaders to do is not celebrate the fastest software delivery teams," he said. "I want them to celebrate the most improved software delivery teams at the end of the year. Let's look at who improved the most. Because that's the team that's embracing this idea of continuous improvement."

Continuous improvement is never "done," which can be difficult for business leaders to absorb, Harvey added. Even the most improved team can still improve further. The slowest team within a company might have improved the most based on the specifics of the application it delivers. Comparing its metrics to a team that develops a different application -- with different constraints, infrastructure requirements and user experiences -- often isn't productive and can even be toxic, he said.

The DORA DevOps report calls for software development teams to focus not on being "feature factories," in Harvey's words but on user experience and team wellbeing.

"Engineers might not know who they're building for [or] why they're building these features, they're just told, 'Ship more, ship more, ship more, ship more,'" he said. "What we see in those sorts of teams is higher burnout -- even though they're able to ship faster, they're maybe not shipping the right thing."

Teams that build software with the end user in mind, however, have 40% higher organizational performance, according to the DORA DevOps report. The ideal approach is a balance between deployment speed, operational performance and user experience, the report states.

One team's experience with DORA metrics automation

This year, Sleuth.io and Propelo, acquired by Harness in January, took further steps to make use of DORA metrics -- not just reporting on them but allowing them to trigger automated workflows to enforce best practices. Propelo's integration into the Harness DevOps platform means users can automatically trigger actions in CI/CD pipelines based on DORA metrics.

Sleuth followed suit with the addition of Sleuth Actions and Sleuth Automations last month. Sleuth Actions was the framework the vendor developed to automate IT processes. It has been expanded and renamed Sleuth Automations, a set of pre-packaged workflows for third-party systems, such as GitHub Actions, that are offered through the Sleuth Automations Marketplace.

Cobre, a corporate payment platform provider in Colombia, began using Sleuth to report on DORA metrics about a year ago. It uses Sleuth Automations to trigger Slack notifications if updates lag between QA and production as well as to automatically block pull requests (PRs) in GitHub Actions if they don't meet policy requirements.

"It doesn't allow a developer to merge a PR if it has more than 20 files changed. It's too big," said Juan Matheus, solutions architect at Cobre. This enforces the DORA-recommended best practice of making small, frequent changes rather than large updates to software.

"This can also help you to encourage your developers to push code to production faster, because they know they can't accumulate a lot of changes," he said.

As this year's DORA DevOps report suggests, applying DORA metrics can lead to a process of continuous learning and improvement. A common bottleneck identified by this year's report, slow code reviews, surfaced for Cobre as it monitored DORA metrics.

There's been a process of continuous learning behind collecting data to measure DevOps team performance, even with a tool such as Sleuth, said Manuel Sanabria, product delivery director at Cobre.

Fixating on performance metrics can lead to ineffective behaviors. Investing in capabilities and learning is a better way to enable success. Teams that learn the most improve the most.
Accelerate'State of DevOps Report 2023'

Specifically, change failure rate and MTTR have been tricky for Cobre, in terms of knowing what data to collect and translating raw data from the company's New Relic observability tool into DORA metrics, he said.

Sleuth's co-founder acknowledged the difficulty Cobre faced.

"How each team defines failure is unique," said Dylan Etkin, co-founder and CEO of Sleuth. "When a team chooses to use custom metrics, like the Cobre team, it can take some configuration on the team's part to decide exactly what is a relevant metric and to understand if that metric truly represents failure for their teams or projects."

In fact, DORA agrees that MTTR has been a tricky statistic, which is why this year, that metric was reworked and renamed failed deployment recovery time, according to Harvey.

"If I push a change to production and that causes an incident, that's one type of failure," Harvey said. "If a backhoe comes and cuts the power to my data center, that's a different kind of failure. And what I need to do is very different in both of those scenarios. So we really had to focus exclusively on those failed deployments."

Still, because each DevOps team and organization is different, it's difficult to prescribe a specific procedure for collecting data on these metrics, Harvey said.

"Some teams will look at, 'Well, if there's a release …' and then within a very short time they have another release, or see a rollback, we can assume … that first release was a change failure," he said. "But it's not necessarily something [where] you can look at your version control system and get the data out of that."

Beth Pariseau, senior news writer at TechTarget, is an award-winning veteran of IT journalism. She can be reached at [email protected] or on Twitter @PariseauTT.

Dig Deeper on Agile, DevOps and software development methodologies