Law 29 · Evaluation & Measurement
Goodhart's Trap
When your eval becomes the goal, it stops measuring what you cared about.

The principle
When a measure becomes a target, it stops being a good measure. Optimize hard against any single metric and the agent learns to game its surface form, padding answers to please a verbosity-biased judge or overfitting a fixed eval set, while the underlying capability stalls or even slips. The number goes up. The thing you cared about doesn't.
Why it happens
An eval is a proxy for what you care about. Optimize against one proxy hard enough and the system learns the cheapest way to raise that score: longer answers for a verbosity-biased judge, format mimicry for a rubric, or memorized quirks in a fixed test set. Reward-hacking research shows this is a deep problem with narrow objectives, not a failure of cleverness. A rotating held-out set helps with memorization, but it does not fix a bad proxy. Use fresh cases, diverse signals, and human reality checks before believing the gain.
Watch for
- Your eval score is climbing steadily while real-user complaints stay flat or rise.
- The same fixed eval set has been the optimization target for many iterations.
- Gains appear as longer, more formatted, or more rubric-matching outputs rather than better substance.
In practice
You optimize a prompt against the same 200-case eval for a sprint, and the score climbs from 82% to 94%. Then users complain the agent feels worse. The system learned the surface of the test: longer answers, cleaner formatting, and patterns your judge rewards, while the underlying capability barely moved. Treat any metric you push on as suspect. Keep fresh held-out cases, compare against different signals, and re-validate on examples the optimizer never saw.
Apply it
- Keep a rotating, held-out eval the optimization loop never sees, and re-validate gains on it.
- Treat any metric you actively optimize as compromised and cross-check against fresh data.
- Watch for surface-form gaming such as padding or format-matching, and penalize it explicitly.
The takeaway
Treat any metric you actively optimize as suspect. Keep fresh held-out cases, cross-check against different signals, and re-validate your gains on examples the optimizer never saw.