Getty Images/iStockphoto

Upstream Istio service mesh hones IT ops user experience

The Istio service mesh community has revealed further plans to simplify the project's notoriously difficult 'Day 2' management, which can't all be addressed by downstream vendors.

Istio maintainers pledged improvements to the service mesh installation and upgrade process and previewed new gateway and telemetry APIs, as enterprise users said its operation remains challenging.

That was the theme of discussions and presentations at the recent IstioCon virtual event, which featured listening sessions geared toward understanding what features users of the open source service mesh want added or improved in future releases, along with updates on roadmap plans for 2022.

"We want to make it so that users spend less time troubleshooting the service mesh itself and are then able to troubleshoot their services with the service mesh," said Eric Van Norman, senior software engineer at IBM, in an IstioCon virtual session.

Istio is among the most prominent examples of service mesh technology, which distributes network management, security and observability for microservices apps among a set of software components called sidecar proxies. Service mesh use has risen along with the Kubernetes container orchestration framework, where it can improve performance and offer better visibility into container-level traffic than traditional network architectures.

Istio has been in the spotlight in part because of its prominent backers -- it was launched by IBM, Google and Lyft in 2017, and its governance remained under Google's control until the cloud provider donated it to the Cloud Native Computing Foundation (CNCF) last month. Istio has not achieved the same dominance as Kubernetes among open source projects, in part due to the two-year holdout by Google before the CNCF donation, and in part due to Istio's reputation for both advanced features and difficult management. Another CNCF service mesh project, Linkerd, has won enterprise early adopters based on ease of use; another open source competitor, Kuma, and its commercial version, Kong Mesh, tout multi-cluster management features that open source Istio lacks.

We want to make it so that users spend less time troubleshooting the service mesh itself and are then able to troubleshoot their services with the service mesh.
Eric Van Norman Senior software engineer, IBM

Why Istio's upstream ease of use matters

Istio's complexity has also created opportunities for third-party downstream vendors, from DevOps platform purveyors and cloud service providers such as Red Hat and Google, to service mesh specialists such as Tetrate and Solo.io. All of these vendors add features that help ease service mesh management or handle it on behalf of users, and that's unlikely to change, according to Brad Casemore, an analyst at IDC.

"It's difficult to envision a time when the upstream releases will be adequately adapted, polished and simplified for the broader enterprise market," Casemore said. "The vendor opportunity will still be there to take the code and provide finishing touches."

However, some users have stuck with the open source Istio service mesh because third-party platforms are too broad or expensive for them. Moreover, such products can't address all of the foibles upstream Istio has baked into its code.

"We considered using Google's managed Istio offering, [but] it did not provide the flexibility we need, and it's already deprecated in lieu of Anthos," said Ben Snyder, senior DevOps engineer for the software monetization division of global IT services and consulting firm Thales, where a team of four people manages thousands of Kubernetes pods connected via Istio.

Istio roadmap
Istio service mesh roadmap presentation at IstioCon 2022

Istio on Google Kubernetes Engine was deprecated by Google as of December 2021. Anthos Service Mesh is available separately from the full Anthos Kubernetes platform, but Snyder said it would be difficult to justify paying for the service for his small team. 

Another Istio user who is an Anthos customer said rolling upgrades between different versions of Istio have still persistently caused outages in ingress gateways, which manage traffic coming into Kubernetes clusters.

"It's not an Anthos problem," said Jeferson Pereira, a cloud engineer at a financial services company. "It's an Istio problem. I've also tested Tetrate and Solo.io [products], and it's the same."

In part, Pereira has encountered the issue because his workloads require an ingress filter to be applied at the global gateway level, which means that he can't follow the standard upgrade process between versions of open source Istio. There has been improvement in the upgrade experience as the upstream service mesh evolved, Pereira said -- upgrades that used to require at least a month to fully validate prior to version 1.9 took about 30 minutes with version 1.13, released in February.

But the complexity of Istio means it should only be used when the intricacy of an IT infrastructure warrants it, Pereira said.

"I don't usually recommend Istio," he said. "In discussions with other IT people, [they say], 'I'm looking to go to Istio because I want the app telemetry.' But the first question is, 'Do you have something in the environment today like Prometheus or Grafana -- have you even started with the basics of monitoring and telemetry? The answer is usually no."

In Pereira's example, Istio's fine-grained security features, such as built-in support for mutual TLS since its earliest versions, are needed for more than 700 microservices applications in its highly regulated financial services environment.

Istio community reckons with Helm

Snyder, Pereira and other users who gave feedback to Istio maintainers during IstioCon cited its fluctuating support for Helm charts as especially disruptive. Helm version 2 was deprecated in an Istio installation tool in early 2020, and then Helm version 3 support was reinstated in early 2021 but remains at the alpha stage. Istio also added a command-line tool, istioctl, and associated Kubernetes Operator, but these are not recommended by the community for production use.

As a result, open source Istio service mesh users such as Thales' Snyder were left with sometimes painful workarounds to achieve the right Envoy configuration customizations.

"Some of our configuration options are now in this weird Istio YAML file that doesn't directly translate to what's happening under the hood," Snyder said. "We have had to create an overlay to manipulate the [Envoy] manifest because even today the istioctl overlay files do not allow you to get into some of the inner workings."

Helm support is expected to move from the alpha stage to beta in the coming year, IBM's Van Norman said during the IstioCon roadmap session. He also acknowledged questions about the future of istioctl and the Istio Operator.

"People think we want to deprecate it, but that's not the case," he said during the session. "It may not be the recommended approach, but at the same time, we don't want to get rid of it. ... We went through some turmoil with getting rid of Helm the first time -- I don't want to see that, personally, again."

Van Norman and co-presenter Louis Ryan, principal engineer at Google and an Istio core committer, highlighted recent improvements to Istio's upgrade process, including support for revision tags with Istio 1.10 in May 2021, which reduced the number of manual changes required to upgrade the Istio control plane. While still in alpha, the new Helm support is simplified compared with previous versions, which had to account for the now-removed Helm Tiller component. The Istio community also let users skip releases as of version 1.11 and extended support windows in September 2021.

Future versions of istioctl analyze, a diagnostic tool that detects potential issues with Istio configurations, will include support for environments other than Kubernetes clusters, Van Norman added during the roadmap session.

Istio roadmap includes new gateway, telemetry APIs

The Istio community is also developing two new APIs to improve the service mesh's integration with Kubernetes gateways and to simplify telemetry data gathering.

Istio is the first service mesh project to implement support for an alpha-stage Kubernetes gateway API, which will replace older service and ingress APIs within the container orchestrator. The new gateway API will standardize a way to install and update network gateways from both open source projects and third-party vendors using a streamlined process.

"[Currently] we have these two parts, the deployment itself and the configuration for it, and these need to be kept in sync," said John Howard, Istio network lead and software engineer at Google, during an IstioCon presentation. "Often that means re-running Helm installs or the istioctl install or something similar, just because you want to expose, say, some new port for your application."

The new gateway API uses a single gateway object to provision multiple gateways both inside and outside of container clusters, so IT pros can swap out gateways easily and automatically propagate gateway configuration data to multiple layers of infrastructure.

The new Istio telemetry API, also in alpha, similarly reduces the number of manual configuration steps required to set up data collection for tools such as OpenTelemetry distributed tracing via the service mesh. It also consolidates configurations for mesh-level, namespace-level and workload-level data gathering into a single process, where they previously were scattered across multiple configuration files.

"Our main goal with the initial Istio telemetry [approach] was to provide a 'one size fits most' [mechanism] but obviously, at high scale or for different use cases, you're going to want to tailor that," said Douglas Reid, an engineer at Google, in an IstioCon presentation. "We needed a way to support [various] use cases in a way that wasn't just playing with Envoy filters that you had to upgrade every time you upgraded Istio."

Beth Pariseau, senior news writer at TechTarget, is an award-winning veteran of IT journalism. She can be reached at [email protected] or on Twitter @PariseauTT.

Dig Deeper on Systems automation and orchestration