The Cyber Kill Chain Wasn't Built for AI Agents. Here's How I Extended It

Share
The Cyber Kill Chain Wasn't Built for AI Agents. Here's How I Extended It
Photo by Karine Avetisyan / Unsplash

A compromised AI agent doesn't need an exploit to do damage. It doesn't escalate privileges or load shellcode. It calls the tools it was already given, with arguments an attacker influenced through a poisoned web page or a malicious tool description. By the time anything looks wrong, the agent has already sent the email, written the file, or moved data between two systems it had every right to touch.

I kept hitting the gap between that reality and the model most of us still use to reason about intrusions. The Lockheed Martin Cyber Kill Chain has been the backbone of detection and response for over a decade. Seven stages, from reconnaissance to actions on objectives, and one powerful idea behind them: break any single stage and the whole attack fails. I wrote a book with Shreyas Kumar last year about putting that model to work in practice. I still believe in it.

But the seven stages carry assumptions that AI quietly breaks. They assume a human attacker working a network and endpoint surface. They assume the attacker has to get code onto a system and run it. Neither holds when the target is a language model or an agent with tool access.

So I extended the model instead of throwing it out. The result is the Extended Cyber Kill Chain for AI-Era Threats, and it is now public and openly licensed.

Where the original model breaks

Four assumptions fail, and each failure pointed at something the framework had to add.

Timing is the first problem. A poisoned training set or a backdoored model published to a public registry can compromise an organization before it has any contact with the attacker at all. The compromise is staged into the supply chain and waits to be pulled in. The canonical kill chain starts at reconnaissance and has nowhere to put an attack that lands before reconnaissance even begins.

Then there is the line between code and data, which language models don't really have. A model reads instructions and content through the same channel and can't reliably tell them apart. Hidden text in a web page or a tool description becomes an instruction the model follows. What the kill chain treats as two distinct stages, delivery and exploitation, collapses into a single move: indirect prompt injection.

There is also the question of what the attacker is even after. We have always framed the endgame as data exfiltration. With AI systems, the model itself is a target. An attacker can reconstruct it by querying its API at scale, pull memorized training data back out of it, or extract the proprietary system prompt that encodes your business logic. Stealing the model is a different attack from stealing the data, and it calls for different defenses.

And finally there is movement. An agent with access to email, files, calendars, and payment tools can pivot between them without a single classical lateral-movement technique. No exploit, no reused credential. It just chains tool calls it was authorized to make. The mechanics look nothing like lateral movement, but the effect on your environment is the same.

What the framework adds

It keeps all seven original stages. On top of them it does three things.

It adds a Stage 0, Model Supply Chain Compromise, that sits in front of reconnaissance. Training-data poisoning, fine-tune backdoors, malicious models on public registries, and poisoned tool catalogs all live here. Consolidating them into one stage gives defenders a single place to reason about supply-chain risk, in the same language they already use for everything downstream.

Inside each of the seven stages, it adds AI-specific sub-techniques, and every one carries an ID a detection rule or a threat report can cite. If you write detections for a living, you can point at EKC-3.5 (tool-description poisoning) the way you point at a MITRE technique today.

The last change is to the final stage, which splits into three peer objectives. Classical data exfiltration stays. Model extraction and agentic pivot join it as first-class objectives rather than footnotes under exfiltration, because the controls that catch one of them will sail right past the other two.

A concrete walk-through

Take the browsing-agent case. A user asks an AI assistant to summarize an article. The page hides instructions, white text on a white background, telling the agent to read the user's other open tabs and post their contents to an attacker's server.

Walk that through the framework and it lines up cleanly. At reconnaissance, the attacker already knows this class of agent can browse and make outbound requests. At delivery, the malicious page gets fetched (EKC-3.1, indirect prompt injection via web content). At exploitation, the agent treats the hidden text as authoritative and calls its HTTP tool with attacker-controlled arguments (EKC-4.3, confused-deputy tool invocation). At actions on objectives, it reads the tabs and ships them out (EKC-7c.1 and EKC-7a).

The payoff is the detection insight that falls out at the end. The earliest reliable signal here is a tool call whose arguments trace back to recently retrieved untrusted content. A team that tracks argument lineage on its outbound tools catches this before the data leaves. That is the kind of stage-specific control a kill-chain framing surfaces and a flat risk list does not.

How it sits with ATLAS and OWASP

This is not a replacement for MITRE ATLAS or the OWASP LLM Top 10. I use both, and the repo ships full mappings to each. The difference is shape. ATLAS is a tactics matrix. OWASP is a prioritized risk list. Both are excellent at what they do, and neither is a kill chain. If your team already reasons in kill-chain stages, this hands you the same AI threat surface in the form your playbooks are built around, with the mappings so you can move between all three without losing your place.

It is worth saying that ATLAS has been moving fast. It reintroduced Command and Control as a tactic in early 2025 and added Lateral Movement late in the year, and its agentic coverage is strong. The framework I am publishing isn't trying to out-catalog ATLAS. It is doing a different job: putting that threat surface in temporal, kill-chain order for the defenders who think that way.

It's yours to use

The framework lives on GitHub at github.com/gouravnagar-infosec/ai-kill-chain under a CC BY license. The repo has the full specification, the diagram, four worked case studies, and the ATLAS and OWASP mappings. It is versioned, so the sub-technique IDs are stable enough to reference in your own detections and reports.

I built this because I needed it and couldn't find it. If you run detection or threat modeling against AI systems, I want to know where it holds and where it falls apart. Open an issue, send a pull request, or just tell me what I got wrong. That is how it gets better.

Read more