Law 31 · Safety & Security

The Lethal Trifecta

Private data, untrusted content, and a way out. Pick at most two.

The principle

An agent becomes exploitable the moment it combines three things: access to private data, exposure to untrusted content, and the ability to send data out. Any one poisoned input in that pipeline can steer it into leaking your data, with no code vulnerability required. Guardrail prose isn't enough, because the model can't be the security boundary.

Why it happens

The danger appears when three capabilities meet: private data, untrusted input, and an outbound channel. A malicious document, email, or web page can tell the agent to read a secret and leak it through a tool call, URL, image fetch, or message. No memory-corruption bug is required. The model only has to follow the wrong instruction once. Filtering the payload is weak because attackers adapt. Breaking the chain is stronger: remove one capability, isolate the data, or make outbound actions narrow, reviewed, and allowlisted.

Watch for

One agent context has access to secrets or private records AND processes text from emails, web pages, or user uploads.
The same agent that reads untrusted input can also send email, make outbound HTTP calls, or write to a shared external store.
Your only defense against malicious instructions is a system-prompt line telling the model to ignore them.

In practice

Your support agent reads from a customer's private ticket history, ingests the body of an inbound email, and can call a send_email tool to reply. That is all three legs: private data, untrusted content, and an exfiltration path. A customer pastes a request to forward another user's account details to an outside address into their email signature and the agent obliges, because it cannot tell that instruction apart from a real one. The fix is not a cleverer system prompt: drop one leg. Make the reply tool draft-only behind human review, or strip the agent's access to other customers' data when it is processing inbound mail.

Apply it

For each workflow, enumerate all three capabilities (private data, untrusted input, outbound channel) and confirm whether one agent holds all three at once.
If all three are present, break the chain: drop one tool, split the data access from the untrusted-input path, or route the outbound action through human review.
Make any externally-communicating action draft-only or allowlisted to known-safe destinations rather than free-form.

The takeaway

Audit every agent for all three capabilities at once. If a workflow has all three, break the chain: remove a tool, isolate the data, or put a human in the gate.

Sources and further reading

Get the audit kit Access the buyer edition Back to all 50 laws

The principle

Why it happens

Watch for

Apply it

Sources and further reading

Related laws