Law 14 · Retrieval & Memory
Keyword Still Carries Weight
Pure semantic search quietly loses to a 40-year-old baseline.

The principle
Dense embedding retrievers win in-domain but often lose to BM25 once you step outside the training distribution. Exact-match terms, product codes, names, and rare jargon are where embeddings blur and plain keyword search shines. In-domain accuracy doesn't predict how well a retriever generalizes, and combining the two is how strong systems cut their retrieval failures dramatically.
Why it happens
Embeddings are good at meaning, but exact tokens can blur: SKUs, error codes, names, rare terms, and domain jargon. BM25 and other lexical methods are old, but they still win when the literal string matters. BEIR showed that dense retrievers that do well in-domain can underperform BM25 out of distribution. The practical answer is hybrid retrieval: run lexical and semantic search together, then fuse and rerank the candidates. The two methods fail differently, so the combination catches cases either one misses alone.
Watch for
- Pure embedding search nails paraphrased demo questions but fails on exact codes, IDs, or product names in production.
- Out-of-domain or jargon-heavy queries return near-identical-looking but wrong matches.
- Retrieval was validated only on in-distribution examples similar to the embedding training data.
In practice
Your pure-embedding search nails paraphrased questions in the demo, then face-plants in production when a user searches for SKU 'AX-4400-B' or an error code, and the dense vectors blur it into a dozen near-identical part numbers. Embeddings smear exact tokens, IDs, names, and rare jargon. Default to hybrid: run BM25 alongside semantic search, fuse the results, and put a reranker on top. The 40-year-old lexical baseline is exactly what rescues your out-of-domain and exact-match queries.
Apply it
- Run lexical and semantic retrieval in parallel and fuse their ranked lists rather than relying on embeddings alone.
- Combine ranked results with a position-based fusion method that needs no score calibration between retrievers.
- Add a reranker over the fused candidates to compound precision, especially for exact-match and out-of-domain queries.
The takeaway
Default to hybrid search, semantic plus keyword (BM25), instead of embeddings alone, especially for jargon, IDs, and out-of-domain queries. Add a reranker on top to compound the gains.