Armin Sestic - Fotolia
Are DevOps responsibilities in production unfair to developers?
When Nike.com gave developers live production support responsibilities, the shakeup unearthed fundamental questions about app design and support processes.
Change is good, unless that change happens to you.
Developers, who've always had a what's not to like? attitude about DevOps, have finally found something not to like: the prospect of having to wear a pager. Long a symbol of the operations side, the pager is now also a symbol of real DevOps, as developers at apparel and footwear brand Nike.com recently discovered when they were asked to be responsible for what happens to their code on live servers.
"Developers were all about Agile and DevOps -- until they realized they'd have to take on the ops part," said Ron Forrester, VP of engineering for Nike.com. At DevOps Enterprise Summit 2017 in San Francisco, he shared the developers' questions: "Are you telling me that I have to wear a pager? Why can't we just have a DevOps team? We already have a production support team, why can't we just call them DevOps and it's fine?"
It's not fine because DevOps is supposed to break down all the silos, including the idea that ops is solely in charge of production. Nike is trailblazing this largely unmapped DevOps shift left that requires developers to truly own their code, even when it's in production.
"If you take a Java developer, he or she is good at a certain programming language, not necessarily at DevOps platforms like Docker or Kubernetes," said Vinothini Raju, CEO at Bluemeric, a DevOps and cloud consultancy in Seattle and Bangalore, India. It takes a lot of training, she said, to give developers an operations perspective. Is it worth the effort?
Justifying the shift left
Developers, as noted by Python programmer and author Jeff Knupp, are highly skilled employees who are most valuable when they write code. Developers are both inefficient and underutilized when troubleshooting a server problem.
"When will I sleep? When will we get our features done?" IT ops support personnel with bloodshot eyes and a permanent sleep disorder might grin sadistically at these questions, but they do highlight a valid problem of DevOps responsibilities and job skills hidden in the goal of team transformation. Developers have a finite amount of time to add unique value to the company.
What does a company accomplish by pushing DevOps to its farthest point? DevOps must provide a business benefit or it's just transformation for transformation's sake.
"It's about cultural accountability," Forrester said, adding that the change revealed more about end-user experience than ever before.
Nike.com is turning these insights into better app development and design. For example, developers wanted to figure out the procedure for incident management on an application that is deployed globally across five regions. Multinational corporations, including Nike, have a global, always-on customer base, with surges in demand around key events, such as holidays and sales.
Ongoing ops support improvements
Alerts at 3 a.m. aren't a perk for anyone, but there are ways to ease the pain.
Nike.com found that the burden of incident response fell first and unfairly on experience engineers because most alerts begin as a poor customer experience. Eventually, the problems would be traced to payment or inventory systems and handed off to those personnel to resolve, but the consumer experience engineer woke up to triage something that ultimately was not their problem. Consumer experience engineers had to morph into first-line production support -- the very silo Nike.com aimed to tear down with shared DevOps responsibilities.
"This is how we know where to improve how we work," Forrester said.
Problem areas and escalation paths become clear over time, Raju said. Advanced deployment techniques, such as canary and blue/green methods, yield predictability and control in live environments. Proactive support, based on AI and machine learning, recognizes patterns and prevents 3 a.m. calls. With automated and intelligent tooling, Raju explained, that initial triage, mitigation and handoff can occur without human intervention.
Effective support also relies on the knowledge base and communication channels, Raju said. She recommends a real-time chat tool with project-specific organization, such as Slack, with chatbots to communicate with the cloud hosting systems and other DevOps tools.
As for the former production support team, they were not put out on the street by ops-responsible developers. Production engineers collaborate with developers to create dashboards, tailor monitoring and alerts, and build flexible and resilient deployment platforms.