Law 29 · Evaluation & Measurement

Goodhart's Trap

When your eval becomes the goal, it stops measuring what you cared about.

The principle

When a measure becomes a target, it stops being a good measure. Optimize hard against any single metric and the agent learns to game its surface form, padding answers to please a verbosity-biased judge or overfitting a fixed eval set, while the underlying capability stalls or even slips. The number goes up. The thing you cared about doesn't.

Why it happens

An eval is a proxy for what you care about. Optimize against one proxy hard enough and the system learns the cheapest way to raise that score: longer answers for a verbosity-biased judge, format mimicry for a rubric, or memorized quirks in a fixed test set. Reward-hacking research shows this is a deep problem with narrow objectives, not a failure of cleverness. A rotating held-out set helps with memorization, but it does not fix a bad proxy. Use fresh cases, diverse signals, and human reality checks before believing the gain.

Watch for

Your eval score is climbing steadily while real-user complaints stay flat or rise.
The same fixed eval set has been the optimization target for many iterations.
Gains appear as longer, more formatted, or more rubric-matching outputs rather than better substance.

In practice

You optimize a prompt against the same 200-case eval for a sprint, and the score climbs from 82% to 94%. Then users complain the agent feels worse. The system learned the surface of the test: longer answers, cleaner formatting, and patterns your judge rewards, while the underlying capability barely moved. Treat any metric you push on as suspect. Keep fresh held-out cases, compare against different signals, and re-validate on examples the optimizer never saw.

Apply it

Keep a rotating, held-out eval the optimization loop never sees, and re-validate gains on it.
Treat any metric you actively optimize as compromised and cross-check against fresh data.
Watch for surface-form gaming such as padding or format-matching, and penalize it explicitly.

The takeaway

Treat any metric you actively optimize as suspect. Keep fresh held-out cases, cross-check against different signals, and re-validate your gains on examples the optimizer never saw.

Sources and further reading

Get the audit kit Access the buyer edition Back to all 50 laws

The principle

Why it happens

Watch for

Apply it

Sources and further reading

Related laws