Rethinking business continuity before the next big IT outage
For one industry analyst, the most significant gap exposed by the CrowdStrike outage is between IT and business -- not Sec and Ops.
Modern cloud and automation technology helps explain why the 2024 CrowdStrike outage became so widespread, but one industry analyst said the likely antidote is decades old.
Business continuity is a longstanding practice, as is enterprise risk management, but there's a gap between these fields and IT, according to Charles Betz, an analyst at Forrester Research and among the authors of a report that called for a redefinition of enterprise resilience in the wake of the CrowdStrike incident.
"On the actual formal enterprise risk management side, I don't know that IT risks are given the kind of standing that they should have in the overall enterprise risk registry," Betz said in an interview with Informa TechTarget's IT Ops Query podcast.
While CrowdStrike remains the largest outage in IT history, another incident of this magnitude is all but inevitable in a hyperconnected digital world. The incident was a surprise to 83% of 1,000 business and IT executives surveyed by incident management vendor PagerDuty in December 2024, but 88% said they believe another major incident will occur in the next 12 months.
There's still an opportunity to learn from some of the worst-case scenarios caused by the CrowdStrike outage, Betz said. He cited the example of Delta Air Lines, which endured a longer disruption to its business than most companies affected by the outage. Delta's delayed recovery was attributed in part to a failure in its crew-tracking system.
Charles BetzAnalyst, Forrester Research
"The bottom line was that the whole flight operations and crew management [system] took a major shock that rippled, and it doesn't matter if their [IT] system was up and accurate, they were up against the laws of physics," he said. "Computers move at the speed of light. … Airplanes don't, and so even if they knew what they needed to do, it just took time and it took fuel to get stuff back to where it needed to be. Now, this is not a failure of IT disaster recovery. This is truly a failure of business continuity."
In other engineering disciplines -- including aerospace engineering -- resilience engineering and business resilience are better understood than they are in IT, particularly as the focus for technologists has trended toward velocity in the cloud era, Betz said.
"We in the [IT] industry are still babies at doing this," he said. "IT needs to start adopting some of these practices from domains that understand this a lot better."
Meanwhile, research by Forrester pointed to another aspect of business continuity where what's old is new again: the configuration management database (CMDB).
"We've got, a couple years running now, statistical data that shows that [CMDBs] are strongly correlated with better outcomes across a wide range of IT priorities and capabilities," in terms of business resilience, Betz said, citing his July 2022 Forrester trend report, "The State of Modern Technology Operations Maturity."
Traditional CMDB systems have a reputation for being unwieldy and prone to failure, that report acknowledged. Betz said he believes the main reason they correlated with success in his research is that they indicated an understanding by IT teams of all their technology assets and how their businesses depend on them. He named Scotiabank and Bank of England as examples of successful organizations.
"Don't get obsessed with the term CMDB," he said. "Think of it as [solving] the data management problem of the business of IT."
Beth Pariseau, senior news writer for Informa TechTarget, is an award-winning veteran of IT journalism covering DevOps. Have a tip? Email her or reach out @PariseauTT.