Law 13 · Retrieval & Memory

Relevant Beats Plenty

Near-misses poison context worse than random noise.

The principle

It's backwards from what you'd expect: documents that are on-topic but don't answer the question hurt more than clearly irrelevant ones, because they look plausible and pull the generator toward answers that are wrong but adjacent. Stuffing more 'kind of relevant' chunks into the context lowers accuracy instead of improving coverage. Precision at the top beats breadth.

Why it happens

The most dangerous distractor is not random junk. It is a passage that sounds related but does not answer the question. The model treats it as evidence because it shares topic, vocabulary, or entity names, then anchors a plausible wrong answer to it. Some retrieval studies even find unrelated noise less harmful than near-misses, though the robust lesson is simpler: precision at the top matters. A few answer-bearing passages beat a padded context full of almost-relevant chunks. Retrieve broadly if needed, but rerank hard before generation.

Watch for

Raising top-k to improve coverage makes answers worse, not better.
Wrong answers are adjacent to the truth, like the right product family but the wrong model number.
Context is filled with many topically similar chunks and no reranking step trims them.

In practice

To improve coverage you bump top-k from 5 to 20, and accuracy drops, because the 15 new chunks are all topically adjacent: same product line, wrong model number, and they pull the answer toward a plausible lie. Clearly irrelevant chunks get ignored, but near-misses get believed. Do not pad context for recall's sake. Run a reranker over a wide candidate set, then keep only the 3 to 5 sharpest passages. A tight context beats a stuffed one.

Apply it

Retrieve a wide candidate set but rerank and keep only the few highest-precision passages.
Tune for precision at the top of the ranking rather than maximizing recall at any cost.
Drop topically similar chunks that do not directly answer the query instead of including them for safety.

The takeaway

Optimize for precision, not recall at any cost. Rerank hard and filter out the distractor chunks. A smaller, sharper context beats a padded one.

Sources and further reading

Get the audit kit Access the buyer edition Back to all 50 laws

The principle

Why it happens

Watch for

Apply it

Sources and further reading

Related laws