Bobbie Stempfley: Cybersecurity AI has a long way to go

Many cybersecurity vendors have embraced AI and machine learning, but CERT Division's Bobbie Stempfley says more work is needed around testing algorithms and validating results.

Artificial intelligence and machine learning have been the talk of RSA Conference for several years, but Roberta "Bobbie" Stempfley said the cybersecurity industry still has a long way to go before realizing algorithms' potential.

Stempfley, director of the CERT Division of Carnegie Mellon University's Software Engineering Institute (SEI), spoke at RSA Conference 2019 about the promise of cybersecurity AI applications, as well as her concerns about the technology. She joined CERT in 2017 after spending several years at the Department of Homeland Security and the Defense Information Systems Agency at the Department of Defense. 

As director of CERT, Stempfley not only leads the division's partnerships with law enforcement and government agencies, but also oversees the direction of the organization's research efforts. One of the areas CERT has focused on lately is how algorithms can be used to improve cybersecurity.

We spoke with Stempfley at RSAC about those research projects, as well as her outlook on cybersecurity AI and machine learning applications and how she weighs the hype versus reality. Here are excerpts from the conversation.

Editor's note: This interview has been edited for clarity.

What's your view of AI and machine learning technology and how they're being used in security products?

Roberta 'Bobbie' StempfleyRoberta 'Bobbie'
Stempfley

Bobbie Stempfley: We know a cyber environment is large, complex and can change fast. That requires automation, but automating security tasks is hard. And that's led us to machine learning. You need to solve issues at speed and scale, and algorithms can help you do that.

Machine learning and AI have been big focal points at RSA Conference in recent years, and it feels like just about every security vendor says they have integrated the technology into their security products. Do you think we're at the point where machine learning and AI are fundamentally improving security?

Stempfley: I think we are in the very early stages of understanding the applications of machine learning and AI for cybersecurity. We're here at RSA [Conference], and we hear that everything in cybersecurity is being fueled by AI, and that's because everything is fueled by data. We have all of the security data -- we just need to manage it better. There are plenty of ways to collect, normalize and index data. It's complex, yes. But is it sophisticated? No.

We don't know all the ways that AI will support cybersecurity yet. But doing that effectively requires a discipline that hasn't been built yet. We have engineering disciplines and software development disciplines, but this is a new area. We don't know enough yet about how to test and validate results from these algorithms.

And when you have that technology, it expands your attack surface. Attackers can try to manipulate the results by feeding [the algorithms] bad data. That's really important to know when you're deploying AI in a contested space with adversaries, and that's why we need a discipline around this.

You mentioned data. Do you think the infosec industry is doing a good enough job sharing information about threats and vulnerabilities? And how could that affect cybersecurity AI applications?

Stempfley: There's a song called 'Standing Knee Deep in a River (Dying of Thirst),' and that's how I feel about security data sometimes. Yes, there's a lot more information sharing today, but it's still not enough. And the problem is we have so much of it, but it's not labeled or curated properly, and it's not good data to use for teaching or training your algorithms.

We need to do a lot of work on getting the data sets right. And we need to better identify the things that algorithms are actually learning from, because even if they're getting the answers right, they might be focusing on the wrong data and doing it for the wrong reasons.

Do you think it will be challenging to develop a discipline around machine learning and AI?

I think we are in the very early stages of understanding the applications of machine learning and AI for cybersecurity.
Bobbie StempfleyDirector, CERT Division of Software Engineering Institute

Stempfley: It's very challenging. But that's our job. We have a number of research projects at SEI right now. And it's no more challenging than developing a discipline around software engineering, and we as an industry have already done that. So, that's a start.

What types of research are you working on?

Stempfley: One of the areas we're focusing on is moving machine learning a little to the left, so to speak. Anomaly detection and thread analyses are definitely things that machine learning can be used effectively for in the future. But my feeling is [this]: Let's look at something simple. And one of those things, like I mentioned earlier, is looking for software vulnerabilities.

No one really codes by hand anymore. We're just taking pieces and chunks of existing code and putting them together. And no one likes to maintain existing code, either. If we could apply machine learning to reviewing code and finding potential areas of weaknesses, then that would be extremely valuable. You could take that code as a data set and run it through an algorithm to see, for example, what open source libraries are used and what that means for the software.

Beyond the need for discipline in cybersecurity AI applications, what else have you learned from your research projects?

Stempfley: I think we all need a better understanding of our environments. The ability to connect and link issues and avoid disruptions happens because you understand your environment, what's in it and how it can be disrupted. And without that understanding and visibility, it makes it hard to respond [to incidents].

There's a lot of security data about vulnerabilities and threats, but there's also a lot of management data about your environment's configurations and controls. The ability to link all of that management data with the business and security data is an evolution that has to occur.

Dig Deeper on Security operations and management