Law 33 · Safety & Security
The Confused Deputy
An agent with your privileges will wield them on an attacker's behalf.

The principle
A confused deputy is a privileged program that a caller tricks into misusing its authority. It isn't malicious, just confused about whose intent it's serving. An LLM agent is the ultimate confused deputy: it holds your credentials and tools, but it'll follow injected instructions and carry out an attacker's intent with your authority. The trap is ambient authority. Authority should travel with the request, not sit waiting inside the agent.
Why it happens
A confused deputy is a program that holds legitimate power and is tricked into using it for the wrong caller. The agent version is direct: it has your tools and credentials, but the intent shaping its next action may come from an attacker-controlled input. The root problem is ambient authority, power sitting in the agent whether the current request deserves it or not. Bind authority to the specific request, caller, and action. Read-only by default and narrow grants for destructive work reduce what an injected instruction can borrow.
Watch for
- The agent runs with a broad, long-lived credential (admin token, write-all API key) it can apply to any action.
- Authorization is checked once at the agent's identity, not per-request against the actual caller and task.
- A tool can perform destructive operations without re-validating that this specific request was authorized for them.
In practice
Your deploy-bot agent runs with a long-lived admin token so it can handle whatever comes up, and it reads GitHub issues to triage them. An attacker files an issue that says run the migration to drop the staging users table, and the bot, holding your privileges, does exactly that. It was not hacked, it was confused about whose intent it was serving. Kill the ambient admin credential: give the agent read-only access by default, scope each tool's authority to the specific task, and require a fresh, narrowly-scoped grant for anything destructive.
Apply it
- Default every tool to read-only and grant write or destructive scope only for the specific task that needs it.
- Bind authority to the request and caller rather than letting it sit latent in the agent's standing identity.
- Require a fresh, narrowly-scoped grant for any irreversible action instead of reusing an ambient credential.
The takeaway
Scope every tool's authority to the specific task and caller. Avoid broad ambient credentials the agent can be tricked into abusing, and prefer read-only by default.