Nvidia AI security architect discusses top threats to LLMs
Richard Harang, Nvidia's principal AI and ML security architect, said two of the biggest pain points for LLMs right now are insecure plugins and indirect prompt injections.
Nvidia's principal AI security architect offered insights from "a year in the trenches" red teaming LLMs during a Black Hat USA 2024 session Wednesday.
The session, titled "Practical LLM Security: Takeaways From a Year in the Trenches," was led by Richard Harang, Nvidia's principal AI and machine learning security architect. During the Black Hat session, Harang covered the Nvidia AI Red Team's findings regarding which kinds of attacks were most common, which were most impactful, and assessing the security posture of large language models (LLMs) in one's own environment.
In a preview for the session, Harang told TechTarget Editorial that one of the most challenging attacks has been indirect prompt injections, a type of attack in which an LLM reads and responds to an instruction from a third-party source. Although tools like the retrieval-augmented generation framework can be used to make LLMs produce more accurate and up-to-date information, this functionality also enables attackers to insert their own content into an LLM's database.
"Indirect prompt injection means I stick something into this document database, and then it comes back," Harang said. "Later on, some other user is using this LLM, they happen to pull up that document, and then that prompt injection activates. It's not like jailbreaking, where I'm the person who's interacting with it and I get the response that I'm trying to elicit from it. I'm interacting with it, and because this third party is able to put content in, suddenly they have some influence over what it is that I see as the user."
The second major pain point Harang said he sees in LLM implementation involves plugins, third-party code that augments the functionality of a model. Because LLMs are generally unable to give up-to-date information and are beholden to training data that may be weeks, months or years old, Harang said, something like an up-to-date weather plugin allows a model to obtain more accurate information for specific queries.
The issue with plugins, he explained, is that they might not be built securely. Attackers can potentially exploit plugins to get downstream access to the model itself. "Sometimes, especially in the presence of indirect prompt injection, people have different ways that they can inject input into the system," he said.
To address and fortify against these issues, Harang advocated for "old-fashioned application security." This includes creating permissions such that users are able to access only the documents and parts of an LLM that they're supposed to, mapping out security and trust boundaries and establishing appropriate access controls.
More specifically for plugins, Harang said organizations should harden them to the point that they'd be comfortable exposing their plugins to the internet. In other words, organizations should keep plugins isolated from authentication and authorization information as well as authentication information isolated from the LLM itself.
"You want the plugin itself to be parameterized -- to have all of those parameters validated, and for it to send back information in a sanitized, validated, parameterized format so that at each step, you are reducing the ability of an attacker to get either their malformed inputs into these plugins or databases or reducing the attacker's ability to have their inputs then proceed back into another iteration of this LLM loop," he said.
As a technology company, Nvidia has seen tremendous growth in recent years -- this year in particular. Though the company has previously been known primarily for its GPUs, Nvidia's work on AI-capable data center chips have skyrocketed its market worth. Asked about his experience being a security architect for Nvidia, a large company becoming much larger, he said the experience has been "a lot of fun."
"I've been in the intersection of machine learning, security and privacy topics for a while, and a lot of that has involved applying ML to security problems. And now, looking at sort of stepping further and further into the security of AI applications, it's a really interesting and an exciting space to be at," Harang said. "And because we do so many models and we build so many AI-powered applications inside Nvidia, it's a fantastic place, in my opinion, to be exposed to what's going on and maybe have a chance to impact the industry in a positive way. It's been really exciting and it moves really fast. But overall, it has been a lot of fun. I've really enjoyed it."
Alexander Culafi is a senior information security news writer and podcast host for TechTarget Editorial.