All writing

AI Security Jun 2026 · 8 min read

Prompt Injection Is a Confused-Deputy Problem We Already Solved

A trusted AI agent acting on an attacker's hidden instruction, the classic confused-deputy problem from systems security

Prompt injection is not a new class of problem. It is a confused-deputy attack, a trusted program tricked into misusing the authority it legitimately holds, and we have had a name for it since Norm Hardy described it in 1988. I have spent seventeen years building the boundaries that contain exactly this failure mode in enterprise networks. Watching the AI-agent world rediscover it from first principles, and reach for all the wrong fixes, is a strange experience. The good news is that the right fix is old, proven, and sitting in the network engineer's toolbox.

Let me make the case plainly, because it reframes the whole debate about whether prompt injection is "solvable." It is not solvable by making the model better. It was never going to be. It is containable, the same way we have contained confused deputies in networks for decades, and the sooner agent builders accept that, the sooner they ship things that survive contact with an attacker.

What a confused deputy actually is

The original example is a compiler. A shared compiler runs with the privilege to write to a protected billing file, that is legitimate, it needs to record usage. A user invokes it and passes an output filename. Nothing stops them passing the billing file's path as that "output" name. The compiler, acting on the user's request but using its own elevated privilege, cheerfully overwrites the billing records. The user never had permission to touch that file. The compiler did, and it was tricked into spending that permission on the user's behalf.

That is the whole shape: a deputy with real authority, an input it cannot fully distinguish from a legitimate instruction, and an attacker who supplies the input. The deputy is not malicious or stupid. It is doing its job, with its own rights, on someone else's command. Confused-deputy problems are about authority, not authentication, and that distinction is the entire point.

Indirect prompt injection is the same shape, one layer up

Now describe an AI agent. It holds real authority, it can read your mailbox, call tools, spend from a budget, write to a database. To do its work it reads content an attacker can influence, a web page, an email, a PDF, a tool result. And a language model has no reliable way to separate "the user's instruction" from "text I just read," because both arrive as the same stream of tokens. An attacker writes an instruction into the content, the agent reads it, and the agent spends its own authority carrying it out. That is a confused deputy with a vocabulary.

This is why it does not get patched away. A 2026 study that ran thousands of attacks against browser agents found direct prompt injection succeeded more than 79% of the time, and its title says the rest: agents may always fall for it. Simon Willison gave the dangerous configuration a name, the lethal trifecta, an agent with private data, untrusted input, and a way to send data out is unconditionally exposed. I made a version of this argument before, that the firewall CVE and the AI-agent skill attack were the same mistake one layer apart; the confused deputy is the older, deeper name for what that mistake actually is.

Network security never tried to make the deputy smarter

Here is the part the agent world keeps missing. When we faced confused deputies in networks, we did not try to make the deputy better at telling good requests from bad ones. We assumed it could not, and we put a boundary around it.

That is the whole discipline of network segmentation. A web server is a deputy with database access. We never expected it to perfectly distinguish a legitimate query from an injected one, so we put it in its own segment, gave it least-privilege access to exactly the data it needs, and filtered what it could reach. Egress filtering is the same instinct applied to exfiltration, even a fully compromised host cannot send data to an arbitrary destination if the firewall only permits a short allowlist. Default-deny, microsegmentation, and a hard line between the data plane and the control plane, none of these make the endpoint smarter. They make its mistakes survivable. I have written about the same separation in the context of authentication bypass across network and AI systems, where trusting the wrong boundary is the recurring root cause.

The agent world keeps reaching for a smarter deputy

Almost every popular prompt-injection defense is an attempt to make the deputy smarter. Better system prompts that beg the model to ignore injected instructions. Jailbreak classifiers that try to spot the malicious text. Delimiter tricks that wrap untrusted content and hope the model respects the wrapper. All of them are the compiler trying harder to tell which filename is the billing file, and all of them lose to an attacker with unlimited rewordings and unlimited attempts. We learned this lesson in network security the expensive way: you cannot pattern-match your way out of a boundary problem. The fully autonomous AI-agent attacks now appearing are simply attackers exploiting that confused deputy at machine speed.

The real fix is the old fix: trust boundaries and capabilities

The defenses that actually hold against prompt injection are, almost word for word, the network playbook moved up a layer. Google DeepMind's CaMeL design is the clearest example, and it does not hide its sources: it borrows control-flow integrity and capability-based security, two ideas straight out of systems and network security. A privileged model writes the plan from your trusted request alone; a separate quarantined model handles the untrusted content and is denied the ability to call tools. The plan is fixed before untrusted data is read, so that data can change what the agent knows but never what it does. On a standard benchmark it blocked close to 100% of attacks. That is not a smarter deputy. That is a boundary between the data plane and the control plane, exactly what segmentation has always been.

Failure dimensionNetwork security answerAI-agent answer
Untrusted input reaching the control pathSegmentation: the data plane kept apart from the control planeDual-model design: the plan is fixed before untrusted data is read
Too much standing authorityLeast privilege, scoped per segmentCapability scoping, per tool call
Data leaving on a hijackEgress filtering to an allowlistEgress allowlist on the agent
"Make the endpoint smarter"Never the strategyNot the strategy either

The practical version for anyone shipping an agent today is the same checklist I would apply to a network: scope the agent's authority to the narrowest set that does the job, control its egress so a hijack has nowhere to send data, and require a human to approve the irreversible. It is least privilege, default-deny, and separation of duties, applied to a model instead of a subnet. The risk framing I outlined in moving from CVSS to attack-success rate is the measurement side of the same idea, and the OWASP LLM Top 10 is the field guide.

Why this is one move, not a career change

People ask how a network-security architect ends up working on AI security, as if it were a leap. It is not. The attacker's move is the same one I have defended against for seventeen years, get a trusted component to spend its authority on your behalf, and the defense is the same one too, do not trust the component to be clever, put a boundary around what it can reach. The tokens are new. The token stream that mixes command and data is new. The fact that the deputy now speaks English is new and genuinely makes the attack easier to write. But the underlying problem is the one Norm Hardy named in 1988, and the discipline that contains it is the one regulated networks have run for decades. For the German Mittelstand specifically, the same boundary-first thinking is what I lay out in the zero-trust 90-day plan.

So when someone tells you prompt injection is an unsolvable problem, hear what they are actually saying: the deputy will always be confusable. They are right, and it does not matter, because we stopped relying on un-confusable deputies a long time ago. We build boundaries instead. The agent era just needs to remember that.


If you are putting AI agents into production and want them threat-modelled with a network engineer's eye for trust boundaries, request a review. I run AI security engagements anchored in 17+ years of enterprise cybersecurity. See also FwChange.com for the change-management side of the same discipline.

Shipping agents that act on untrusted input?

Threat-model the authority your agents hold before an attacker does it for you.

Request an AI security review