InfiniteFlow-stock.adobe.com
The agentic AI 'lethal trifecta': What CISOs should know
The very capabilities that make an AI agent useful also make it dangerous. Here's what CISOs should know about the agentic AI lethal trifecta, and what they should do about it.
By now, every CISO has probably heard the phrase lethal trifecta tossed around in AI security discussions. The term refers to a combination of three agentic AI properties that, together, make agents vulnerable to attack and put the enterprises using them at massive risk.
Programmer Simon Willison is credited with coining the term lethal trifecta as it relates to agentic AI. Unfortunately, the cybersecurity field does not currently agree on a universal definition: different cybersecurity analysts and AI researchers often pick different trios of properties. And, of course, there's no need to stop at three, but we lack a cutesy term like quadfecta or quintfecta to describe a longer list.
That said, conversations about the agentic AI lethal trifecta often center on the following three properties, as initially described by Willison:
- Agent access to private or sensitive information, whether personal information about staff or customers or confidential intellectual property.
- Agent ingestion of uncontrolled content. That is, having an agent that reads data from sources the enterprise does not control, such as public websites, and that can contain either intentionally incorrect information -- meant to affect enterprise or agent decisions -- or hidden prompts intended to redirect agent goals or actions.
- Agent ability to communicate externally, and so to exfiltrate data.
Alternatively, some cybersecurity experts include the following properties in the agentic AI lethal trifecta:
- Agent empowerment to act in ways that affect other enterprise systems -- e.g., reconfiguring network devices or modifying databases.
- Agent ability to plan and adaptively pursue long-term objectives without reconfirmation of purpose by a human. Adaptability includes the ability to exploit chains of low-impact vulnerabilities -- e.g., CVEs with low CVSS scores -- to achieve high-impact outcomes such as root-level access to a key server.
- Agent ability to self-improve and gain capabilities -- e.g., modifying its own code; modifying its own goals; finding other tools to fill its functional shortcomings; or designing better models, then creating and use tools based on them.
- Agentic velocity, or the ability to swamp human-scaled governance mechanisms.
- Agentic prompt drift -- i.e., agent non-determinism. Agents and other AIs can produce dramatically different results in response to the same prompt -- and indeed, many jailbreak attacks rely on this to get an AI to break free of its alignment training.
- Agent cost indeterminacy. An AI's actual costs, in terms of tokens expended, can spiral unpredictably due to factors such as prompt drift and "context rot," which drives it into recursive loops of re-reading the same context data.
- Agents with superhuman persuasiveness can pursue slow and sophisticated social engineering attacks at scales previously impossible.
Pick any subset of these problems, and the core idea is the same: AI plus agency plus permission to act in the enterprise environment add up to a risky synergy with potentially catastrophic consequences.
Why CISOs should pay attention
Agentic AI introduces a new category of cyberthreat -- one that can exploit every other existing threat category. An agent with data access, external connectivity and the ability to act autonomously could reconfigure systems, exfiltrate sensitive data and more, making it both a significant insider threat and attack vector for external threat actors.
Traditional security tools can't address the potential problems agentic AI creates; for example, traditional web application firewalls can't prevent prompt injection attacks. Organizations must update core architectures to properly integrate new categories of agentic AI security tools, as well as policies that govern acceptable use of agentic AI and incident response. However an organization defines the lethal trifecta, the CISO must coordinate and drive the security and governance response.
How to assess your risk exposure
As a CISO assessing your organization's lethal trifecta risk, ask yourself the following key questions:
- How much access do AI agents have to core enterprise software such as a CRM?
- How much access do AI agents have to enterprise data?
- How much access do AI agents have to enterprise infrastructure -- such as network equipment -- and services -- such as an IaaS environment or the DNS service?
- How much access do agents have to the internet?
- How much access do external entities have to systems in the environment, including AI agents -- e.g. through Model Context Protocol (MCP) services?
The answers reveal the reach of AI agents in the enterprise -- including those entering through MCP from outside the organization -- and establish the baseline scope of risk. An inability to answer the questions with confidence signals a significant risk, in itself.
Mitigation strategies
The best strategy for mitigating agentic AI risk is, as is so often the case, implementing a zero-trust architecture. Infuse the AI infrastructure with core zero-trust principles, strictly limiting access to systems and data based on identity and allow lists. At a minimum, this will mean the following:
- Adding identity management for AI agents, either by deploying a new ID management system specifically for agents or by extending an existing system able to meet requisite scale and speed targets. Software that manages identity for Kubernetes containers might serve, for example.
- Channeling communications from and to AI agents through MCP gateways or the like, to provide control points for allowing or denying access and for monitoring behavior.
- Adopting a "deny all" default access level and then allowing specific entities to do specific things, as necessary.
- Extending the tool set to include the following:
- Semantic firewalls that sniff out prompt injection, hypnosis attempts and so on.
- "Path-dependent" access management systems that assess inbound and outbound prompts based on the context of past prompts, and watch for slow, subtle attack patterns.
- Model drift monitoring.
-
Behavioral threat monitoring to catch and interrupt risky agent behavior patterns -- e.g., revoke an agent's access to a key database if it repeatedly tries to perform operations on the database for which it doesn't have permission.
John Burke is CTO and a research analyst at Nemertes Research. Burke joined Nemertes in 2005 with nearly two decades of technology experience. He has worked at all levels of IT, including as an end-user support specialist, programmer, system administrator, database specialist, network administrator, network architect and systems architect.