Google DORA issues platform engineering caveats

As with generative AI, the same techniques that can boost enterprise developer productivity can also slow and destabilize overall software delivery.

Overall, the platform engineering trend in enterprise IT is a positive one, according to an influential DevOps report -- but it requires continuous improvement to avoid potentially significant downsides.

Platform engineering, the practice of maintaining an internal self-service IT infrastructure on behalf of development teams, has gone mainstream over the last two years as full-stack developers proved to be in short supply for enterprise organizations. Generally, this approach has worked well, according to Google Cloud DevOps Research and Assessment (DORA) team's annual "Accelerate State of DevOps" survey of 3,000 organizations

"Internal developer platform users had 8% higher levels of individual productivity and 10% higher levels of team performance," according to the survey report released Oct. 22. "Additionally, an organization's software delivery and operations performance increases 6% when using a platform."

But amid the findings, a warning also emerged.

"These gains do not come without some drawbacks," according to the report. "Throughput and change stability saw decreases of 8% and 14%, respectively, which was a surprising result."

Platform engineering and AI's influence

In DORA's terminology, delivery throughput measures the speed of making updates to applications; change stability measures whether updates lead to additional work due to failures -- higher stability means failures are less likely, so a decrease in stability means more additional work. Another section of the report found similar downsides to the use of AI-powered coding tools on overall software delivery.

DORA can't say definitively whether the two trends are connected based on this year's data, but "it's definitely part of our hypothesis," said Nathen Harvey, DORA lead and developer advocate at Google Cloud.

"Perhaps the platform isn't doing enough to provide test infrastructure and test frameworks and so forth," Harvey said. "In either case, you're getting less quality and maybe less quantity of feedback on changes as you go. … Maybe we're not getting feedback fast enough."

In one analyst's view, this disparity is rooted in the uneven use of AI automation between software development and platform teams so far.

"Code velocity and stability … have always been a problem with hyper-productive organizations," said Andy Thurai, an analyst at Constellation Research. "If the code velocity is increased to multitudes higher, but the release engineering, testing and SRE are all still done in old-school fashion, this will lead to a very unstable environment."

Platform engineering friction, security tradeoffs

Some of the imbalance between platform engineering and software delivery velocity is inherent to the practice, according to the DORA report.

"The added machinery that changes need to pass through before getting deployed to production decreases the overall throughput of changes," the report stated. "In general, when an internal developer platform is being used to build and deliver software, there is usually an increase in the number of 'handoffs' between systems and implicitly teams."

That "added machinery" might introduce friction, but it could also stem from what TechTarget's Enterprise Strategy Group found as a primary motivator for platform adoption in recent research: improved security.

"Integrating security policies directly into [infrastructure-as-code] and GitOps workflows ensures that security is embedded into the overall development lifecycle, instead of being treated as an afterthought," according to the analyst firm's Application Modernization and the Role of Platform Engineering report published in October. "Automated monitoring and real-time security compliance checks can help maintain consistent policies across environments, protecting application infrastructure from threats while maintaining operational agility."

However, another Google DORA hypothesis is that an increased sense of safety to experiment with changes could lead to change instability in some platform environments.

"In this instance the higher level of instability isn't necessarily a bad thing since the platform is empowering teams to experiment and deliver changes, which results in an increased level of change failure and rework," according to the DORA report.

Or it could also be that the platform is ineffective in ensuring the quality of changes made to production, in which case it must be reworked, the report stated.

Platform engineering setbacks can be temporary 

In either case, the DORA report found another, more encouraging pattern: Many platform engineering teams met with initial success, followed by challenges, and eventually returned to realizing productivity gains, provided they listened to developer feedback.

"We see initial performance gains at the onset of a platform engineering initiative, followed by decrease and recovery as the platform ages and matures," according to the report. "This pattern is typical of transformation initiatives as early gains are realized, and then headwinds are encountered as the easier gains have been realized."

That's a message that resonated with one IT pro who has experience providing an internal developer platform for AI.

"Several years ago, we chose to centralize all AI models and frameworks into a standard platform … to decouple the AI technology choices and governance from product codebases," said Ian Beaver, chief scientist at Verint Systems, a contact-center-as-a-service provider in Melville, NY. "This rollout led to an initial disruption as product teams were having to spend resources removing embedded AI from their products and integrating APIs to call AI as a service."

Beaver's platform team also had to educate research teams on how to deploy new services and models onto the platform and perfect its own practices in metering, cost tracking and compliance, he said.  

"Now that the major adoption pains are behind us, we have seen a marked decrease in time to market for new AI features and development effort to adopt new AI services," he said.

Verint's platform team can solve a problem once and then reuse that knowledge for multiple teams via updates to internal APIs, increasing the mileage of AI models, Beaver said. The company can also switch between large language models and cloud providers behind platform APIs without changes to software products.

"In all, the internal development platform for AI services was a large, multi-year internal undertaking," Beaver said. "But it has been a significant quality-of-life improvement for both research and product teams."

Beth Pariseau, senior news writer for TechTarget Editorial, is an award-winning veteran of IT journalism covering DevOps. Have a tip? Email her or reach out @PariseauTT.

Dig Deeper on DevOps

Search Software Quality
Search App Architecture
Cloud Computing
Search AWS
TheServerSide.com
Search Data Center
Close