May 7, 2026

Blast radius for AI agents: a working framework

A pattern I see over and over in AI security reviews: the team has thought hard about what the agent can do. They’ve spent almost no time on what the things it does can reach.

That’s the wrong axis. Tool calls are nouns. Blast radius is the verb.

A working definition

For any AI agent in a production environment, the blast radius is the union of every system, dataset, identity, and downstream consumer that can be affected — directly or transitively — by the agent’s outputs and tool calls, assuming the agent itself is compromised.

That last clause is doing a lot of work. The point isn’t to model what the agent should do. It’s to model what an attacker who owns the agent (via prompt injection, jailbreak, or model compromise) gets for free.

Five dimensions to score

When I’m scoping the blast radius of an agent, I’m asking five questions:

1. Identity reach. What runtime identity does this agent execute as? What is that identity entitled to do across your environment? “Service principal with Mail.Read on tenant” is a different blast radius than “service principal with Mail.ReadWrite.All plus Sites.FullControl.All.”

2. Data reach. Reads first, writes second. Reads matter because output-layer exfiltration is real and rarely detected. Writes matter because actions are irreversible and amplifying.

3. Tool reach. What function calls are wired up? For each one: can it cause an external-facing action (an email, a payment, an API call) or only an internal-facing one (a database query, a file read)? External tools are an order of magnitude more dangerous.

4. Downstream reach. Where does the agent’s output go? Into a human’s inbox? Into a dashboard? Into another agent’s prompt? The third option is the one most teams haven’t thought about. Agent-to-agent flows turn small failures into big ones.

5. Recovery reach. When the agent does something wrong, what’s the rollback story? Reversible vs. irreversible matters more than a lot of risk frameworks acknowledge.

Score, then act

I don’t think numerical risk scores carry much weight in isolation. But scoring forces the conversation. A useful rough rubric:

Low blast radius — agent’s identity has narrow read-only scopes, no external tool calls, no agent-to-agent handoff, output goes only to a human reviewer.
Medium — read scopes are broad, one or two internal tools, output goes to other automated systems.
High — write permissions, external tools, agent-chained outputs, no human in the loop on irreversible actions.
Catastrophic — runs as a service identity with admin scopes anywhere in your environment. Don’t ship this until you’ve removed at least two of those.

The question that ends most arguments

When a team disagrees about whether an agent is “safe enough” to ship, I try to skip the abstract debate and ask one thing:

If this agent’s system prompt got swapped for one written by an attacker tonight, and the agent kept running with all its current permissions, what would the worst 24 hours look like?

If the answer is “an embarrassing email,” ship it.

If the answer involves the word “irrecoverable” or any external transfer of money, data, or trust — go back to the drawing board.

The AI Asset & Blast Radius Mapper operationalizes this exact framework, with AI-assisted scoring and CISO-facing narrative generation.