Law 35 · Safety & Security
Sandbox the Blast Radius
Assume the agent gets compromised, then contain what it can reach.

The principle
Defense in depth means planning for the injection that succeeds. Box the agent in with filesystem isolation (access scoped to specific directories) and network isolation (exfiltration blocked), and a compromised agent can't reach past its sandbox. Real incidents, like CI agents that could leak secrets through untrusted content, show why that second layer matters when the first one fails.
Why it happens
Assume one prevention layer fails. A sandbox limits the damage when it does. Filesystem isolation keeps the agent inside the task directory instead of the whole machine. Network isolation prevents a compromised run from posting secrets to arbitrary hosts. Credential isolation keeps ambient tokens out of reach. These controls are boring and deterministic, which is exactly why they matter. Prompt-injection defenses may reduce the chance of compromise; sandboxing reduces the blast radius after compromise. The second layer is what turns a bad instruction into a contained incident.
Watch for
- Agent tool execution runs with the full host environment, including credentials in environment variables.
- The agent has unrestricted outbound network access rather than an allowlist of required destinations.
- A successful injection could read or write files well outside the task's intended working directory.
In practice
Your CI agent runs untrusted PR branches and has the build runner's full environment, including the cloud credentials sitting in env vars and open egress to the internet. A contributor's PR adds a test that reads those secrets and POSTs them to their server, and the injection succeeds on the first try. Defense in depth assumes exactly this. Run agent tool execution in a container scoped to the one working directory, with an egress allowlist that blocks everything but the registries you need, so a successful compromise is a contained annoyance instead of a credential leak.
Apply it
- Run tool execution in an isolated environment scoped to a single working directory with no access to ambient secrets.
- Enforce an egress allowlist that blocks all outbound traffic except the specific destinations the task requires.
- Design assuming the injection succeeds, and verify that the worst reachable outcome is contained, not catastrophic.
The takeaway
Run agent tool execution in an isolated environment with constrained filesystem and network access, so a successful injection stays contained instead of turning catastrophic.