Problem
A leading energy company needed a Q&A system over 50,000+ internal technical documents — drilling reports, well completion records, and regulatory filings. Off-the-shelf dense retrieval systems failed on exact technical terminology (e.g. well IDs, formation names, equipment codes) where BM25-style exact match is critical.
Approach
The pipeline uses a two-stage hybrid retrieval strategy:
- Stage 1 — parallel retrieval: FAISS HNSW index for dense semantic search + BM25 Elasticsearch index for keyword match. Top-k from each (k=20) are merged and deduplicated.
- Stage 2 — reranking: A cross-encoder (
ms-marco-MiniLM-L-6-v2fine-tuned on domain data) scores all 40 candidates and returns the top 5. - Stage 3 — generation: GPT-4o produces the final answer with retrieved context, citing source document IDs.
Results
Evaluated on a manually curated 200-question benchmark:
- Answer faithfulness (human-rated): 78% (hybrid) vs 66% (dense-only) — +12 pp
- Retrieval recall@5: 0.84 vs 0.71
- Latency (p95): 1.8s end-to-end (including reranker)
Tech stack
LangChain, FAISS, Elasticsearch 8, FastAPI, OpenAI API, HuggingFace Transformers, Docker, Redis (answer cache).