Is today's CrowdStrike outage a sign of the new normal?
A CrowdStrike update with a faulty sensor file has global implications for Windows systems. But competitors need to limit the finger-pointing in case it happens to them.
I woke up this morning to text messages and alerts about widespread outages from CrowdStrike and Microsoft 365. Reports conflict as to whether or not the outages are related, but Microsoft has said the CrowdStrike issue is separate from the outage that affected Azure systems in the Central U.S.
While Microsoft has resolved its Azure-specific issues, CrowdStrike is still dealing with the implications of an update that pushed out a faulty sensor file that caused Windows systems to get the blue screen of death. This issue is affecting millions of desktops, laptops and servers that use the CrowdStrike Falcon platform, and its consequences extend far beyond those machines.
The CrowdStrike issue is especially problematic due to its workaround resolution, which requires hands-on access to each affected machine. While this hands-on approach can be avoided on devices that have remote keyboard, video and mouse capabilities, like some servers or machines with Intel vPro, this is a relatively small percentage of devices.
Making matters more complicated, many endpoints use BitLocker disk encryption, and the keys might not be available locally or at all since the system is unresponsive. Also, modifying the system in Safe Mode requires admin rights, so normal user accounts won't have access to make the changes. This means an admin either needs to be present or the admin credentials need to be shared with the end user so they can enter them. (Obviously, the latter is a huge security risk that results in further work down the road to change all the local admin passwords.)
It appears, however, that systems left alone in a reboot loop will eventually fix themselves, since the machine is contacting CrowdStrike's update servers before the blue screen of death is triggered. It's just a matter of when the update will make it down, given the overwhelming demand placed on CrowdStrike's content delivery network.
Competitors have a tough needle to thread
While much of the eventual impact on CrowdStrike's reputation depends on its technical and organizational response to this issue, other vendors will no doubt use this as an opportunity to chip away at CrowdStrike's leadership position. This is always a tough needle to thread because next week another vendor might be the cause of another high-profile problem. Finger pointing is rarely an effective long-term strategy.
The path for competitors might be to demonstrate how their processes avoid this kind of thing -- how they test, validate and deploy without altering the validated package (I'm not saying that's what happened with CrowdStrike, but something had to mess up that driver file). But competitors can't put themselves too far out there because they will lose customer trust if an issue happens to them.
If anyone stands to gain from this, it's Microsoft. Third-party vendors always have to justify why customers should buy their products when Microsoft is already on their systems. Often, this is easy enough to do, but customers are always looking for reasons to cut costs. Even though customers might be reticent to put all their eggs in one basket, they could justify it by saying that increasing the number of vendors increases the chance that any one of them will have a huge problem.
Is this the new normal?
The CrowdStrike issue, coupled with a relatively run-of-the-mill outage of Azure -- not to minimize it, but cloud outages happen from time to time -- are understandably causing some uproar up and down IT organizations. All of a sudden, things that seemed strong now seem fragile.
I'm reminded of the fable "The Scorpion and the Frog." If you're not familiar with it, a scorpion asks a frog to help him cross a river. The frog doesn't want to out of fear the scorpion will sting him, but the scorpion convinces him that he won't sting the frog since they would both die. Halfway across the river, though, the scorpion stings the frog anyway. As they sink, the frog asks why, to which the scorpion replies, "I can't help it. It's in my nature."
This stuck out to me today because, like it or not, we're all beholden to technology for our daily lives, and as good, stable, effective or powerful as any technology is today, it's still software. In fact, we've built up a software-driven world around us, built on processes that we think are strong enough to keep things like this from happening -- until they're not.
The fact that these outages came from CrowdStrike and Microsoft is almost irrelevant. It could've been anyone. This doesn't mean people shouldn't be upset with these vendors or take steps to try to prevent this kind of thing from affecting them in the future. But it means we should remain aware that in a software-defined, technology-driven world, bugs -- and outages -- happen.
They're in its nature.
Gabe Knuth is the senior end-user computing analyst for TechTarget's Enterprise Strategy Group.
Enterprise Strategy Group is a division of TechTarget. Its analysts have business relationships with technology vendors.