imtmphoto - Fotolia

Enterprise SREs guide devs through Kubernetes in production

Behind most enterprise developers that own and operate the services they create stands a platform engineering team that's teed up Kubernetes in its most manageable form.

SEATTLE -- Successful deployments of Kubernetes in production hinge on IT ops teams that break down container management complexity for developers into simple, reusable patterns.

This theme emerged among enterprise IT pros from large companies who shared their experiences with cloud-native app development at KubeCon here this week. While the shift to DevOps requires developers to own the services they create, they make those services work smoothly through application templates and configuration guidelines created by site reliability engineers (SREs). Infrastructure teams at forward-thinking organizations maintain centralized, automated Kubernetes platforms, so developers aren't bogged down in infrastructure nuances while they troubleshoot apps.

"It's not enough to just throw Kubernetes over the wall at engineers and hope for the best to happen," said Greg Taylor, release engineering manager at Reddit Inc., the San Francisco-based online community platform.

Reddit's Kubernetes origin story

Reddit's early DevOps approaches used infrastructure teams to handle service deployments and operations, but weren't fast enough, Taylor said. In early 2016, Reddit's sysadmins began to reinvent themselves as internal service providers to soften the Kubernetes learning curve for software engineers, and stay out of their way.

This project began with a set of application definitions Reddit calls baseplates, which ensure a consistent general structure for each service. Next, the Reddit team generated Dockerfiles and CI configurations to guide developers as they created application services. Google's Skaffold tool made local app development environments as similar to Kubernetes in production as possible without requiring developers to have deep Kubernetes expertise.

We point engineering teams down a well-manicured path from concept to production. That path minimizes their exposure to the underlying technology.
Greg Taylorrelease engineering manager, Reddit

Reddit used Drone for CI and Spinnaker for CD to automate deployments and rollbacks on Kubernetes infrastructure, and the Reddit infrastructure team added an internal knowledge base to guide developers on how to operate their services post-deployment, supported by training. Behind the scenes, the infrastructure team set up guard rails, such as Kubernetes resource limits and network policies, to minimize incidents caused by inexperienced developers, such as Kubernetes resource limits and network policies.

"We point engineering teams down a well-manicured path from concept to production," Taylor said. "That path minimizes their exposure to the underlying technology."

This curated, cookie-cutter approach to Kubernetes management was common practice among KubeCon presenters. Fairfax Media, Australia's largest media company that owns publications such as The Sydney Morning Herald, calls its app templates "skeletons." Airbnb calls them "controllers." Each company's templates require a different list of basic ingredients, but all are built with the same goal in mind: Minimize the amount of time application owners think about the Kubernetes back end.

KubeCon 2018 show floor
Show floor at KubeCon, where a major focus was successful Kubernetes deployments.

Large-scale Kubernetes shops call in platform vendor reinforcements

As more enterprises deploy Kubernetes in production -- and with bigger scope -- some SRE teams that have simplified container infrastructure for developers internally look to platform vendors such as Google and Red Hat to help them handle the complexities that come with high scale.

"There are issues around things like domain name system (DNS) latency that you might not encounter if you're not at large scale," said Erik Rogneby, senior manager of infrastructure development at media company USA Today Network, based in McLean, Va. USA Today managed its own hybrid cloud Kubernetes deployments in on-premises data centers and AWS in 2016 and 2017, but this year began to migrate to Google Kubernetes Engine (GKE).

"When we rolled our own Kubernetes, our teams ran the Kubernetes environment, including etcd, which is like running a mini-cloud yourself," Rogneby said. "Networking, service discovery, ingress and load balancing became much simpler [with the move to GKE]."

USA Today's infrastructure engineering teams still manage clusters and deployment pipelines, but they don't have to worry about the resiliency of etcd clusters, and upgrades between versions of Kubernetes are much less disruptive, Rogneby said.

Other large-scale Kubernetes users keep their clusters in private data centers but use a packaged Kubernetes distribution such as Red Hat's OpenShift Container Platform to ease container management.

"We have 18 clusters and fewer than 10 people to run them," said Mark DeNeve, systems engineer at HR services company Paychex in Rochester, N.Y., which has used OpenShift since it first supported Kubernetes with version 3.0 in 2015. "Having Red Hat for support means DevOps teams don't have to figure out why something is broken or track all the different randomly named projects [within Kubernetes]."

Enterprises dream of an easy button for Kubernetes in production

Even packaged and hosted Kubernetes distributions could be much simpler, users said.

"I'd love to be able to create a complete manifest of everything that's needed to run apps, including third-party tools like [HashiCorp] Vault, or service patterns like pub/sub queues," Rogneby said.

DeNeve said he hopes OpenShift 4.0, due out in the second quarter of 2019 and previewed by Red Hat here this week, will smooth the cluster installation and upgrade process. Red Hat plans to roll out a utility with 4.0 called the OpenShift Installer that can match the automated cluster spin-up process in managed cloud services such as GKE, Amazon EKS and Microsoft's Azure Kubernetes Service for on-premises users.

The bulk of Red Hat OpenShift customers still run Kubernetes versions between 1.6 and 1.9 due to difficulties with the upgrade process, said Michael Barrett, director of OpenShift product management at Red Hat. The company hopes to change that with operators for each component of the Kubernetes framework that automate ongoing maintenance of container clusters as self-driving systems.

Dig Deeper on Containers and virtualization