Table of Contents
Six parts, twenty chapters, three appendices. Parts II, III, V, and VI are designed to stand on their own, so readers can enter at the layer of the stack that matters most to them. Each chapter below shows a short summary and a status indicator so you can tell what is ready to read.
Part I
Why Hybrid Search
After Part I, you'll understand exactly where keyword and vector search fail and have a decision framework for when hybrid retrieval is worth the complexity.
Readers evaluating whether hybrid search is worth the investment, or building the case for a migration, should start here. Readers already convinced that hybrid search is the right approach can skip to Part II.
Chapter 1
The Limits of Keyword Search
For three decades, every mainstream open-source search engine has relied on the same core idea: match the terms in a query against an inverted index and rank by a scoring function like BM25. This chapter explains how BM25 works, why that approach has been so durable, and the specific, well-documented ways it fails silently on a substantial fraction of real user queries.
Chapter 2
The Limits of Vector Search
Vector search promises to solve the vocabulary mismatch problem by matching meaning instead of words, and on aggregate benchmarks it has decisively overtaken BM25. In exchange, it introduces a different category of failures (entity confusion, hallucinated similarity, and blindness to negation) whose silent nature often makes them more difficult to surface and correct than the ones BM25 creates.
Chapter 3
The Case for Hybrid
Where each retrieval paradigm breaks down is not arbitrary: lexical and dense methods miss different queries in predictable, measurable ways, and the overlap between their failure sets is small. This chapter develops that observation, walks through the three main fusion approaches (Reciprocal Rank Fusion, weighted linear interpolation, and learned fusion), surveys benchmark evidence for hybrid consistently beating either approach alone, and offers a decision framework for deciding when the added complexity earns its keep.
Part II
Architecture
After Part II, you'll be able to design a hybrid search system on paper and choose the platform to build it on.
Engineers designing a new search system or re-architecting an existing one should start here.
Chapter 4
Hybrid Search Architecture Patterns
Three architectural shapes dominate hybrid search: parallel retrieval with late fusion, sequential pipelines, and unified single-pass indexes. Each makes different bets about latency, operational complexity, debuggability, and how the system scales. The chapter compares the three shapes and develops a reference pipeline, annotated with per-stage latency budgets, that anchors the architecture discussion throughout the rest of the book.
Chapter 5
Query Understanding
Everything a retrieval pipeline does downstream is bounded by how well the system interprets the query up front. This chapter lays out the query understanding layer piece by piece, covering retrieval routing, intent classification, entity recognition, expansion, spell correction, and synonym handling, and then reframes the raw query log as a product-level feedback signal.
Chapter 6
The Reranking Stage
Fusion produces a short list that is usually close to correct but rarely in the right order. Reranking takes that short list, rescores it with a heavier model, and fixes the ordering. The chapter surveys cross-encoders, late-interaction architectures like ColBERT, and LLM-based rerankers, and examines why stacking additional rerankers yields diminishing returns once a fixed latency budget is in play.
Chapter 7
Choosing Your Search Platform
Every search platform now claims hybrid support, but the implementations behind those APIs differ substantially in architecture, fusion methods, filtering behavior, and cost structure. This chapter provides a vendor-neutral comparison of how each major platform implements hybrid search and decision frameworks organized by use case, scale, and team capability.
Part III
Models
After Part III, you'll know which embedding and reranker models to select, when to fine-tune, and how to train domain-specific models.
ML engineers and RAG team leads who already have a search system and want to improve its model layer should enter here. This part assumes some familiarity with training neural networks.
Chapter 8
Embedding Model Selection
No other choice in a hybrid pipeline has comparable blast radius: once documents are embedded, switching models means reindexing the entire corpus. The chapter maps the current model landscape, unpacks the benchmarks teams rely on to compare options, explains how embedding dimensionality feeds into storage and query-time cost, and offers a repeatable evaluation recipe for validating candidates on your own data before you commit.
Chapter 9
Fine-Tuning Embeddings for Your Domain
When no off-the-shelf embedding model meets the quality bar on domain-specific evaluation, fine-tuning is the next step. This chapter walks through deciding whether the investment will pay off, assembling the dataset, choosing loss functions and training hyperparameters, building a hard-negative mining loop, and validating real gains on held-out domain evaluations.
Chapter 10
Choosing and Training Reranker Models
Because a reranker evaluates a query and a candidate document together rather than encoding them in isolation, it can capture fine-grained relevance signals that bi-encoders miss, at a substantial inference cost. The chapter recommends sensible starting checkpoints, walks through distilling them into smaller and faster variants, explains domain adaptation strategies, and defines the conditions under which training a custom reranker from scratch is actually the right call.
Part IV
Evaluation
After Part IV, you'll have a complete methodology for measuring search quality, from offline metrics through production A/B testing.
Chapter 11
Search Quality Metrics
Few product surfaces admit as much rigorous measurement as ranked retrieval does, and decades of IR research since the Cranfield and TREC eras have produced a stable set of evaluation tools. This chapter works through the core metric families (precision, recall, NDCG, MRR, and their common variants) and lays out a selection framework for matching each metric to the product question it actually answers.
Chapter 12
Building an Evaluation Pipeline
A metric is only as useful as the pipeline that computes it on every code change, with enough coverage to catch regressions before users do. This chapter shows how to assemble golden test sets, scale relevance labeling with LLM-based judges, automate offline evaluation runs, stratify results to surface per-segment regressions hiding inside an aggregate win, and wire the whole thing into CI gates for deployments.
Chapter 13
Online Evaluation and Experimentation
An offline harness forecasts relevance changes; live traffic is what confirms them. This chapter covers A/B testing tailored to ranked retrieval, interleaving as a higher-sensitivity alternative, the business-level metrics that tie search quality to revenue and engagement, guardrail metrics that catch regressions the primary metric misses, and the statistical gotchas that make search experiments particularly easy to get wrong.
Part V
Production Operations
After Part V, you'll know how to index at scale, meet latency budgets, monitor quality, and manage infrastructure cost.
Platform engineers, SREs, and MLOps teams running hybrid search in production should enter here.
Chapter 14
Indexing at Scale
Running both a lexical and a vector index on the same corpus is strictly harder than running either one alone: inverted indexes prefer a steady stream of small writes, while ANN structures prefer infrequent large rebuilds, and a hybrid system must satisfy both at the same time. The chapter covers the batch-versus-incremental trade-off, refresh strategies that keep the system queryable during updates, containing the cost of embedding recomputation, evolving schemas as models and fields change, and isolating tenants on shared infrastructure.
Chapter 15
Latency, Throughput, and Scaling
Every 100 milliseconds of added search latency measurably reduces user engagement, and hybrid pipelines must fit multiple retrieval stages into a fixed latency budget. This chapter explains how to allocate that budget across stages, use caching effectively, scale horizontally, tune ANN index parameters, and manage the long tail of slow queries.
Chapter 16
Monitoring and Observability
Production search quality erodes in ways no static test set can anticipate, as query distributions change, embedding spaces drift, and the indexed corpus evolves. This chapter assembles four complementary monitoring layers (query analytics dashboards, drift detection on embeddings, alerting thresholds tied to SLOs, and explicit user feedback capture) that together create a closed loop between what the system does in production and how engineers decide to improve it.
Chapter 17
Cost Optimization
Vectors dominate the economics of hybrid search: their memory footprint grows linearly with the number of documents, with embedding dimensionality, and with the overhead of the ANN index structure itself. The chapter decomposes spending into indexing compute, storage, query-time compute, and operational overhead, then works through the main levers for reducing the bill: lower-dimensional models, quantization schemes, tiered storage, and the architectural question of whether to build the platform or rent it.
Part VI
Applied Domains
After Part VI, you'll have domain-specific playbooks for the three largest hybrid search deployment categories.
Each chapter in Part VI is designed to stand alone as a guide for practitioners in that domain.
Chapter 18
Hybrid Search for RAG Pipelines
Choosing an LLM matters far less in a RAG system than ensuring the retrieval layer surfaces the right passages at the right granularity. This chapter adapts the general hybrid retrieval architecture to the RAG setting, where the retrieval unit becomes a chunk rather than a full document, the downstream consumer is an LLM instead of a human skimming a result list, and the success bar moves from ranking quality toward generation faithfulness and groundedness.
Chapter 19
E-Commerce Product Search
A product catalog is not a document corpus: each product is a semi-structured record of free-text fields plus typed attributes, and a query like "red running shoes under $120" mixes natural-language intent with hard filter conditions. This chapter works through how to weave semantic retrieval together with attribute filtering, resolve the pre-filter versus post-filter tension at catalog scale, layer personalization onto the ranking stack, and tie relevance metrics directly back to conversion and revenue.
Chapter 20
Enterprise Knowledge Search
No other hybrid search domain presents the same operational surface area as enterprise knowledge: content lives in dozens of source systems that were built in isolation and never intended to interoperate. This chapter addresses retrieval over heterogeneous document types, enforcing per-document access control inside vector indexes, meeting regulatory and audit requirements, and building the connector and ingestion plumbing that makes the content reachable in the first place.
Part I
Why Hybrid Search
After Part I, you'll understand exactly where keyword and vector search fail and have a decision framework for when hybrid retrieval is worth the complexity.
Readers evaluating whether hybrid search is worth the investment, or building the case for a migration, should start here. Readers already convinced that hybrid search is the right approach can skip to Part II.
Chapter 1
The Limits of Keyword Search
For three decades, every mainstream open-source search engine has relied on the same core idea: match the terms in a query against an inverted index and rank by a scoring function like BM25. This chapter explains how BM25 works, why that approach has been so durable, and the specific, well-documented ways it fails silently on a substantial fraction of real user queries.
Chapter 2
The Limits of Vector Search
Vector search promises to solve the vocabulary mismatch problem by matching meaning instead of words, and on aggregate benchmarks it has decisively overtaken BM25. In exchange, it introduces a different category of failures (entity confusion, hallucinated similarity, and blindness to negation) whose silent nature often makes them more difficult to surface and correct than the ones BM25 creates.
Chapter 3
The Case for Hybrid
Where each retrieval paradigm breaks down is not arbitrary: lexical and dense methods miss different queries in predictable, measurable ways, and the overlap between their failure sets is small. This chapter develops that observation, walks through the three main fusion approaches (Reciprocal Rank Fusion, weighted linear interpolation, and learned fusion), surveys benchmark evidence for hybrid consistently beating either approach alone, and offers a decision framework for deciding when the added complexity earns its keep.
Appendices
- Appendix A.Mathematical Foundations Quick Reference
- Appendix B.Benchmark Datasets for Search Evaluation
- Appendix C.Migration Playbook