Konstantin Emelyanov - Fotolia

After Istio architecture upheaval, leaders pledge stability

The Istio service mesh went through major disruptions in 2020, from technical redesign to governance shifts. Now, project leaders vow to tackle lingering IT ops issues.

The past year has been volatile for the Istio service mesh project, but with several major disruptions behind it, the project's new steering committee says users can expect a smoother experience from now on.

Istio is an open source service mesh project founded in 2017 by Google, IBM and Lyft. Service mesh is a networking approach that distributes policy and security enforcement functions among a data plane of distributed proxies that report to a central control plane, and is commonly used in microservices environments.

The most significant technical change to the Istio architecture last year came with version 1.5, released in March, which began a move to a completely reworked control plane. In previous versions, the control plane had been based on a group of five microservices. Version 1.5 began to condense those into a single monolithic process called Istiod.

The disruptions didn't stop there. The version 1.6 release in May 2020 removed support for Kubernetes Helm charts, but the project would add Helm v3 support again with version 1.8 in November. The switch to Istiod pushed some elements of the microservices control plane into the Envoy proxy, such as authentication and authorization policy enforcement; the project also added an entirely new extension system based on WebAssembly.

For some early adopters, the shift to a monolith eased longstanding pain with upgrades, as intended.

"We honestly had problems with most upgrades from 1.1 [through] 1.5," said Joe Searcy, a member of mobile carrier T-Mobile's distributed systems technical staff, in an online interview during this week's IstioCon virtual event. "We just worry about scaling a single component now -- [upgrading from] 1.5 to 1.6 was much better due to the focus on stability in the project and us just having better tooling to catch things."

We just worry about scaling a single component now -- [upgrading from] 1.5 to 1.6 was much better due to the focus on stability in the project.
Joe SearcyDistributed Systems Technical Staff, T-Mobile

Other users weren't able to keep up with all the Istio architecture changes happening on a quarterly basis last year, according to a user experience survey conducted in the third quarter. A slight majority -- 54.1% of 61 respondents -- said they didn't upgrade Istio frequently enough. Istio research further found that 63% of Istio deployments were left with critical vulnerabilities because of upgrade delays; 35% were running non-supported older versions of Istio.

"We were sensitive to the fact that upgrades while [architectural] changes were going on might be disruptive to users, and so we wanted to counterbalance that with investments in the experience around upgrades," said Louis Ryan, principal engineer at Google, in a presentation this week. "Even so, we were getting feedback from users that new releases were hard to consume quickly enough sometimes."

Amid all this, the project was also at the center of a governance controversy after Google donated its trademark to a new Open Usage Commons group rather than the Cloud Native Computing Foundation (CNCF) that oversees Kubernetes. Community members also elected a new steering committee that included representatives from outside Google for the first time.

Under the new steering committee, maintainers began to work on a new release process with well-defined development, alpha, beta and generally available release stages, each of which now has a corresponding readiness checklist.

The istioctl command-line interface added troubleshooting commands, as well as an upgrade verification command that produces warnings about potential issues before users go through a failed upgrade process. Istio contributors now have a more systematic development workflow and testing process for new features, which included automated testing for documentation updates to match code changes.

Screenshot of IstioCon roadmap session
IstioCon roadmap session

The project also established a new upgrade working group to further improve the upgrade experience and will strengthen support this year for users who want to skip over versions as they upgrade.

"The Istio project has matured significantly, even just last year," said Neeraj Poddar, co-founder and chief architect at F5 Networks service mesh subsidiary AspenMesh and a member of the Istio steering committee, in a presentation. "We have come up with a pretty stable core now… [users] will get a lot of stability and still get new features, but that new feature rate might not be as aggressive as it was in 2020."

Istio looks to build on early momentum

Thanks to the backing of large IT vendors such as Google and IBM at the project's inception, Istio became the focus of most early discussions about emerging service mesh technology in 2018 and 2019. While governance issues around the Istio project opened new opportunities for service mesh competitors to emerge in 2020, a CNCF survey last year found that it remains the most-adopted service mesh among members. Among 1,324 respondents to the survey, 27% said they use a service mesh in production, and of that number, 47% use Istio.

Despite its unconventional governance, Istio also has the broadest contributor base among open source service mesh projects, with more than 1,900 contributors from more than 350 contributing companies, according to a presentation this week by Lin Sun, an Istio maintainer and director of open source at Solo.io.

Some enterprises that had held back on committing to Istio because of governance concerns now say they have settled on it as their service mesh of choice, in part because it still has the most community momentum and support.

"[HashiCorp] Consul [Connect] shows a lot of promise, but Istio is something the industry has been willing to standardize behind," said Andy Domeier, senior director of technology operations at SPS Commerce, a Minneapolis-based communications network for supply chain and logistics businesses. "I don't know anyone running service mesh on top of Consul just yet, but I know many people familiar with Istio and Envoy."

While other service mesh projects such as Linkerd appeal to enterprises because of their ease of use and now match most of Istio's advanced features, Istio is still the most customizable mesh, which is important in very large and complex environments where IT pros have the skills to take advantage of that flexibility.

"We'd already standardized on a GitOps model for driving our platform automation, and Istio was no exception," said T-Mobile's Searcy in a presentation. "We built out a small abstraction layer that allows us to manage our platform components in a very flexible way, [which] gives us varying degrees of granularity for installation, configuration and upgrades of the Istio control planes and gateways."

Still, managing the Istio architecture since pre-1.0 versions has been difficult for Searcy's team, he said.

"Let's just say it's been a wild ride," he said in his presentation. "As with any complex software, you need a good plan for lifecycle management -- just getting it installed everywhere is not enough."

Dig Deeper on Containers and virtualization