Law 14 · Retrieval & Memory

Keyword Still Carries Weight

Pure semantic search quietly loses to a 40-year-old baseline.

The principle

Dense embedding retrievers win in-domain but often lose to BM25 once you step outside the training distribution. Exact-match terms, product codes, names, and rare jargon are where embeddings blur and plain keyword search shines. In-domain accuracy doesn't predict how well a retriever generalizes, and combining the two is how strong systems cut their retrieval failures dramatically.

Why it happens

Embeddings are good at meaning, but exact tokens can blur: SKUs, error codes, names, rare terms, and domain jargon. BM25 and other lexical methods are old, but they still win when the literal string matters. BEIR showed that dense retrievers that do well in-domain can underperform BM25 out of distribution. The practical answer is hybrid retrieval: run lexical and semantic search together, then fuse and rerank the candidates. The two methods fail differently, so the combination catches cases either one misses alone.

Watch for

Pure embedding search nails paraphrased demo questions but fails on exact codes, IDs, or product names in production.
Out-of-domain or jargon-heavy queries return near-identical-looking but wrong matches.
Retrieval was validated only on in-distribution examples similar to the embedding training data.

In practice

Your pure-embedding search nails paraphrased questions in the demo, then face-plants in production when a user searches for SKU 'AX-4400-B' or an error code, and the dense vectors blur it into a dozen near-identical part numbers. Embeddings smear exact tokens, IDs, names, and rare jargon. Default to hybrid: run BM25 alongside semantic search, fuse the results, and put a reranker on top. The 40-year-old lexical baseline is exactly what rescues your out-of-domain and exact-match queries.

Apply it

Run lexical and semantic retrieval in parallel and fuse their ranked lists rather than relying on embeddings alone.
Combine ranked results with a position-based fusion method that needs no score calibration between retrievers.
Add a reranker over the fused candidates to compound precision, especially for exact-match and out-of-domain queries.

The takeaway

Default to hybrid search, semantic plus keyword (BM25), instead of embeddings alone, especially for jargon, IDs, and out-of-domain queries. Add a reranker on top to compound the gains.

Sources and further reading

Get the audit kit Access the buyer edition Back to all 50 laws

The principle

Why it happens

Watch for

Apply it

Sources and further reading

Related laws