CrowdStrike outage underscores software testing dilemmas CrowdStrike shareholders sue, alleging false security claims

Microsoft, SecOps pros weigh kernel access post-CrowdStrike

Microsoft will explore alternatives to direct kernel access for partners following the CrowdStrike outage. But some IT pros worry that change could do more harm than good.

Microsoft said it will explore alternatives to direct kernel access for partners following the CrowdStrike outage. But some SecOps pros question whether that will prevent further stability issues.

The outage began July 19 when a bug in an update to CrowdStrike's Falcon software on Windows systems failed to load properly. Because CrowdStrike's software ran as a device driver in the core Windows OS kernel, also known as Ring Zero, its failure prompted a kernel panic and a failure of the operating system. As a result, some 8.5 million Windows systems crashed globally, snarling airports and public transit as well as disrupting healthcare and financial services.

In the aftermath of the incident, some SecOps experts questioned why Microsoft allows partners such as CrowdStrike to have direct kernel access. By contrast, a previous CrowdStrike update that caused kernel panic on Linux systems but ran as an eBPF program outside the kernel was addressed by Red Hat earlier this year without such paralyzing effects.

An initial explanation from the company pointed to a 2009 European Union antitrust ruling that requires Microsoft to grant third parties the same access to its OS that it has. But late last week, the company indicated it will rethink that stance.

"This incident shows clearly that Windows must prioritize change and innovation in the area of end-to-end resilience," wrote John Cable, vice president of program management for Windows servicing and delivery at Microsoft, in a company blog post on July 25. "Examples of innovation include the recently announced VBS [virtualization-based security] enclaves, which provide an isolated compute environment that does not require kernel mode drivers to be tamper resistant, and the Microsoft Azure Attestation service, which can help determine boot path security posture."

Pros and cons of kernel access

Debate about whether kernel access is necessary for cybersecurity tools isn't new but newly in the spotlight given the high-profile nature of the CrowdStrike incident. Proponents of kernel access say it's necessary to provide complete system visibility with high performance, enforce security measures as systems boot, and detect threats such as bootkits and rootkits.

"There are very few security products that don't have kernel access," said Kyler Middleton, senior principal software engineer at healthcare tech company Veradigm. "They require very deep integration and access to the Windows kernel in order to keep it safe."

In scenarios such as the faulty CrowdStrike update, kernel failure is an expected and secure behavior, Middleton said.

Kyler Middleton, senior principal software engineer, VeradigmKyler Middleton

"This is a hard problem to solve. Most kernel developers I've talked to say that if something at Ring Zero/kernel-level fails to load, the OS should not load. It'll be in an unreliable and unpredictable state, which could lead to a security issue," she said.

On the other end of the spectrum, some software engineers believe it's past time for Microsoft to provide alternatives to direct kernel access.

"There is simply no need for all of this to run as dangerous kernel modules," said David Strauss, co-founder and CTO at WebOps service provider Pantheon. "Microsoft has had, since 2009, to do more than dangerously comply with the EU order. What they should have introduced is a sandboxed in-kernel or userland mechanism."

VBS enclaves and the Azure Attestation service are promising alternatives to the way software vendors such as CrowdStrike run currently, Strauss said.

"If CrowdStrike wanted to isolate their fast-moving code out of their main kernel module … they could use something like VBS to do so. Usually, there's something like that in malware scanners, even if not VBS, because of polymorphism," Strauss said. "Azure Attestation takes an approach of comprehensive integrity management, leaving no unverified room for malware to inhabit. This is a superior, modern alternative to malware scanning. But this specific product is only available on Microsoft's cloud."

Still, some SecOps pros doubt that a shift away from kernel access in newer products will prevent future outages.

"The idea of creating enclaves is not new," said Keith Townsend, president at The CTO Advisor, a Futurum Group company. "While it can work in theory, it takes moving the entire Windows ecosystem, from ISVs to customers, to adopt the new approach. Ask any IT manager, and they'll tell you they are always a handful of applications away from being able to turn off an insecure access method."

Microsoft would run into conflicts with some of its own products if it required VBS across the board, for example, said Adrian Sanabria, a board member at Security Tinkerers, a non-profit cybersecurity organization.

"I remember having to disable memory integrity -- one of the protections related to VBS -- to be able to participate in a cybersecurity training course, which required using VMware virtualization software," Sanabria said. "In addition, the new Snapdragon ARM-based laptops that Microsoft has been promoting can't run third-party antivirus and can't use … virtualization-based security unless you disable the secure mode that only allows users to run vetted software from the Microsoft Store."

Is it SecOps or Sec vs. Ops?

For one industry observer, the dynamic reflected in the CrowdStrike outage isn't about the technical details of how tools operate but an organizational disconnect between IT security teams that choose tools and the operations teams that must respond to issues when those tools fail.

"Because of well-known breaches, like the [2014] Sony Pictures breach, security teams have gotten a lot more power than operations people and can mandate to the ops teams what products to roll out across the entire environment," said Rich Lane, IT director at the City of Medford, Mass., who said he saw these patterns develop as an IT practitioner at past jobs, such as at Bain Capital during the aftermath of the Sony breach, and as an executive at security operations software vendor Netenrich. "And [ops has] no ability to push back and say, 'I think this is a bad idea,' or 'What's your update methodology?' That's what led to this whole fiasco."

Whether Microsoft changes its kernel access approach matters less, in Lane's view, than ops having a seat at the table when security is choosing and implementing products, especially given the disruption a change in software implementation might cause.

Keith Townsend, president, The CTO Advisor Caption: Keith TownsendKeith Townsend

"[Removing kernel access is] something [Microsoft] probably should have done 25 years ago -- just the fear is going to be, 'How many hundreds of products is it going to break?'" he said. "The hands-on people have to be there to kick the tires on products."

Many security breaches are accomplished without malware, Sanabria said. But anti-malware tools are sometimes chosen for political reasons.

"As a CISO, you never want to have to explain during a breach why you chose not to use AV [anti-virus] software," he said. "Many security leaders buy third-party AV software for 'CYA' purposes. An opportunity to share responsibility for security is also an opportunity to share the blame when things go wrong, which could save your job as a security leader."

For Middleton, these organizational problems do exist in some places but also weren't the main cause of the CrowdStrike incident.

"I don't think this is a case of 'picked the wrong product,'" she said. "By all accounts, CrowdStrike is a fantastic product, and the dev team made a mistake, and the unit testing didn't catch it, and the asynchronous way data files are deployed triggered the bug later all at once, without testing. It's a hurricane of bad luck."

Beth Pariseau, senior news writer for TechTarget Editorial, is an award-winning veteran of IT journalism covering DevOps. Have a tip? Email her or reach out @PariseauTT.

Next Steps

CrowdStrike outage shows business continuity still a DR must

CrowdStrike disaster exposes a hard truth about IT

CrowdStrike outage underscores software testing dilemmas

Dig Deeper on IT systems management and monitoring