Blog

Short pieces on hybrid search, embeddings, reranking, and retrieval evaluation. Each post relates to a chapter of Designing Hybrid Search Systems.

Part I · Chapter 14 min read
The Vocabulary Mismatch Problem: Why BM25 Fails Silently

Users pick the same word for the same concept less than 20% of the time. Keyword search cannot bridge that gap, and most systems never measure it.

April 20, 2026

Part I · Chapter 25 min read
Vector Search Always Returns Something, Even When It Should Not

A nearest neighbor always exists, so vector search never returns zero results. That is precisely why its failures are so hard to notice.

April 13, 2026

Part I · Chapter 36 min read
Hybrid Search vs Vector Search: Where Each Actually Wins

Hybrid retrieval consistently outperforms keyword-only and vector-only on public benchmarks, but the case for hybrid is about complementary failure sets, not averages.

April 6, 2026

Part II · Chapter 45 min read
Three Hybrid Search Architecture Patterns and Their Trade-offs

Parallel, sequential, and unified hybrid search architectures are not interchangeable. They make different bets on latency, complexity, and debuggability.

March 30, 2026

Part II · Chapter 55 min read
Query Understanding: Why One Retrieval Path Is Never Enough

Treating every query identically wastes compute on easy queries and under-serves hard ones. Query classification decides which retrieval path to invoke.

March 23, 2026

Part II · Chapter 65 min read
Cross-Encoder Reranking: The Highest-Leverage Stage in Hybrid Search

A good reranker can fix a mediocre first-stage retriever, but only if it fits inside a tight latency budget. Pick the candidate set and the model together.

March 16, 2026

Part II · Chapter 76 min read
Elasticsearch vs Pinecone vs Weaviate: No Platform Wins on Everything

Every search platform now advertises hybrid support. The implementations behind those APIs differ, and so does the platform that is right for your team.

March 9, 2026

Part III · Chapter 85 min read
Embedding Model Selection: MTEB Rank Is Not Enough

Benchmark leaderboards are a starting point for picking an embedding model, not a decision. Domain fit matters more than MTEB rank, and the commitment is harder to reverse than it looks.

March 2, 2026

Part III · Chapter 95 min read
The Negatives You Train On Decide Your Embedding Model's Ceiling

Given fixed positives, the choice of negatives is the highest-leverage lever left in embedding fine-tuning, and getting it wrong quietly poisons retrieval quality.

February 23, 2026

Part III · Chapter 105 min read
Reranker Distillation: Cross-Encoder Quality at a Fraction of the Latency

Distilling a large cross-encoder into a smaller model approximates its quality at a fraction of the serving cost. It is how most production rerankers actually get deployed.

February 16, 2026

Part IV · Chapter 115 min read
Search Quality Metrics: Optimizing One Number Is Dangerous

NDCG, MRR, and recall do not measure the same thing. Picking a single metric to optimize guarantees you will ship something that regresses on the others.

February 9, 2026

Part IV · Chapter 125 min read
LLM-as-Judge for Search: Scalable, Biased, Calibratable

LLM judges rank far-apart systems reliably and collapse exactly where leaderboard rank matters most. The evidence, and what it means for offline evaluation.

February 2, 2026

Part IV · Chapter 135 min read
Interleaving vs A/B Tests: Why Ranking Experiments Are Different

Interleaving experiments detect ranking quality differences with far fewer users than a standard A/B test, which matters when your traffic is limited or your effects are small.

January 26, 2026

Part V · Chapter 145 min read
Embedding Indexing Cost Is Where Your Money Actually Goes

In a hybrid pipeline at scale, embedding computation and the vector index dominate cost. Stale embeddings are a second, quieter bill your users pay in quality.

January 19, 2026

Part V · Chapter 155 min read
HNSW Parameter Tuning: M, efConstruction, efSearch Explained

The three HNSW knobs (M, efConstruction, efSearch) move your recall-latency curve more than most teams realize. Pick defaults deliberately, not because the library shipped them.

January 12, 2026

Part V · Chapter 165 min read
Embedding Drift Monitoring: Search-Specific Model Degradation

A search system's quality can degrade without the latency, error rate, or any other standard dashboard noticing. Embedding drift monitoring is one of the pieces that standard ML observability tends to miss.

January 5, 2026

Part V · Chapter 175 min read
Vector Cost Optimization: Matryoshka and Quantization Without the Hype

Stacked compression can cut vector index RAM by up to 192x, but the quality losses are non-additive. A validation workflow is the only way to find the Pareto point.

December 29, 2025

Part VI · Chapter 185 min read
In RAG, Retrieval Quality Beats Generation Quality Per Dollar

Upgrading the LLM is the most visible decision in a RAG pipeline. Upgrading retrieval usually moves output quality more, and for less money.

December 22, 2025

Part VI · Chapter 196 min read
E-Commerce Hybrid Search: Semantic Meets Structured Filtering

A product query like 'red running shoes under $120' mixes semantic intent with exact filtering. Neither pure keyword nor pure vector search handles that, and hybrid is only a partial answer.

December 15, 2025

Part VI · Chapter 206 min read
Enterprise Search Access Control: Decide at Index Time, Not Query Time

Enforcing document-level access control inside a vector search index is an architectural decision, not a runtime filter. Getting it wrong leaks data in subtle ways.

December 8, 2025