Why software resilience should be the real goal of DevOps
To improve your software development process, use DevOps to create a resilient system. Expert Matthew Heusser explains why reliability is no longer the goal.
Years ago, I worked for an organization that prided itself on its internal controls for software. Using a work order, developers had to get every box checked and then outline the steps to move the code to production. If there was a problem, systems would be down, and any "fix" would require a similar rigorous -- and documented -- process. That single-minded focus on reliability meant we had to batch changes together into projects and roll out less often. It became a vicious cycle.
But then came DevOps. I don't mean DevOps in the fancy CI/CD way, though. In this case, it's about dev and ops focused together on what matters, which, as is clear from the above example, cannot be reliability. It has to be software resilience.
A few years ago, my friend Noah Sussman suggested that instead of reliability, software systems should focus on resilience. Where reliability is focused on failure prevention, software resilience is more concerned that a single failure does not destroy the system.
Resilience in an e-commerce application
To understand this, look at Amazon's front page. Have you noticed it seems a bit like a group of boxes put together like Legos? There is a title bar with links to your profile and your orders. Underneath that, there are lists: "Fun gift ideas under $10," "Things you browsed recently," "Your recommendations" and so on.
Software resilience is more concerned that a single failure does not destroy the system.
Each of those containers is a combination of display code and a web service call. If the web service is down -- perhaps because a new change will roll out this very second -- then the box does not display. The Amazon homepage continues to function perfectly well, and it is likely a customer did not even notice the absence of the box. That is the classic definition of software resilience.