Law 35 · Safety & Security

Sandbox the Blast Radius

Assume the agent gets compromised, then contain what it can reach.

The principle

Defense in depth means planning for the injection that succeeds. Box the agent in with filesystem isolation (access scoped to specific directories) and network isolation (exfiltration blocked), and a compromised agent can't reach past its sandbox. Real incidents, like CI agents that could leak secrets through untrusted content, show why that second layer matters when the first one fails.

Why it happens

Assume one prevention layer fails. A sandbox limits the damage when it does. Filesystem isolation keeps the agent inside the task directory instead of the whole machine. Network isolation prevents a compromised run from posting secrets to arbitrary hosts. Credential isolation keeps ambient tokens out of reach. These controls are boring and deterministic, which is exactly why they matter. Prompt-injection defenses may reduce the chance of compromise; sandboxing reduces the blast radius after compromise. The second layer is what turns a bad instruction into a contained incident.

Watch for

Agent tool execution runs with the full host environment, including credentials in environment variables.
The agent has unrestricted outbound network access rather than an allowlist of required destinations.
A successful injection could read or write files well outside the task's intended working directory.

In practice

Your CI agent runs untrusted PR branches and has the build runner's full environment, including the cloud credentials sitting in env vars and open egress to the internet. A contributor's PR adds a test that reads those secrets and POSTs them to their server, and the injection succeeds on the first try. Defense in depth assumes exactly this. Run agent tool execution in a container scoped to the one working directory, with an egress allowlist that blocks everything but the registries you need, so a successful compromise is a contained annoyance instead of a credential leak.

Apply it

Run tool execution in an isolated environment scoped to a single working directory with no access to ambient secrets.
Enforce an egress allowlist that blocks all outbound traffic except the specific destinations the task requires.
Design assuming the injection succeeds, and verify that the worst reachable outcome is contained, not catastrophic.

The takeaway

Run agent tool execution in an isolated environment with constrained filesystem and network access, so a successful injection stays contained instead of turning catastrophic.

Sources and further reading

Get the audit kit Access the buyer edition Back to all 50 laws

The principle

Why it happens

Watch for

Apply it

Sources and further reading

Related laws