# RAG hybrid search returns no results because dense and sparse scores are on different scales

- **ID:** `llm/rag-hybrid-search-score-mismatch`
- **Domain:** llm
- **Category:** data_error
- **Verification:** ai_generated
- **Fix Rate:** 80%

## Root Cause

When using hybrid search (dense + sparse), the dense embedding similarity scores (e.g., cosine similarity 0.0-1.0) and sparse BM25/SPLADE scores (e.g., 0-20) are not normalized to the same range before fusion, causing one component to dominate or return no results after thresholding.

## Version Compatibility

| Version | Status | Introduced | Deprecated |
|---------|--------|------------|------------|
| langchain==0.2.5 | active | — | — |
| chromadb==0.5.0 | active | — | — |
| qdrant-client==1.9.0 | active | — | — |
| elasticsearch==8.13.0 | active | — | — |

## Workarounds

1. **Normalize both score sets to [0,1] using min-max scaling before fusion. Example:

def normalize_scores(scores):
    min_s, max_s = min(scores), max(scores)
    if max_s == min_s:
        return [0.0] * len(scores)
    return [(s - min_s) / (max_s - min_s) for s in scores]

dense_scores = normalize_scores(dense_scores)
sparse_scores = normalize_scores(sparse_scores)
hybrid_scores = [alpha * d + (1-alpha) * s for d, s in zip(dense_scores, sparse_scores)]** (90% success)
   ```
   Normalize both score sets to [0,1] using min-max scaling before fusion. Example:

def normalize_scores(scores):
    min_s, max_s = min(scores), max(scores)
    if max_s == min_s:
        return [0.0] * len(scores)
    return [(s - min_s) / (max_s - min_s) for s in scores]

dense_scores = normalize_scores(dense_scores)
sparse_scores = normalize_scores(sparse_scores)
hybrid_scores = [alpha * d + (1-alpha) * s for d, s in zip(dense_scores, sparse_scores)]
   ```
2. **Use a rank-based fusion (RRF) instead of score-based fusion. RRF combines ranks directly, avoiding scale issues entirely. Example:

def reciprocal_rank_fusion(dense_ranks, sparse_ranks, k=60):
    scores = {}
    for rank, doc_id in enumerate(dense_ranks):
        scores[doc_id] = 1 / (k + rank + 1)
    for rank, doc_id in enumerate(sparse_ranks):
        scores[doc_id] = scores.get(doc_id, 0) + 1 / (k + rank + 1)
    return sorted(scores.items(), key=lambda x: x[1], reverse=True)** (95% success)
   ```
   Use a rank-based fusion (RRF) instead of score-based fusion. RRF combines ranks directly, avoiding scale issues entirely. Example:

def reciprocal_rank_fusion(dense_ranks, sparse_ranks, k=60):
    scores = {}
    for rank, doc_id in enumerate(dense_ranks):
        scores[doc_id] = 1 / (k + rank + 1)
    for rank, doc_id in enumerate(sparse_ranks):
        scores[doc_id] = scores.get(doc_id, 0) + 1 / (k + rank + 1)
    return sorted(scores.items(), key=lambda x: x[1], reverse=True)
   ```
3. **Configure the vector database to use built-in score normalization. For Qdrant, set `quantization_config=models.ScalarQuantization(scalar=models.ScalarQuantizationConfig(type=models.ScalarType.INT8))` which normalizes scores internally.** (82% success)
   ```
   Configure the vector database to use built-in score normalization. For Qdrant, set `quantization_config=models.ScalarQuantization(scalar=models.ScalarQuantizationConfig(type=models.ScalarType.INT8))` which normalizes scores internally.
   ```

## Dead Ends

- **Lowering the similarity threshold to 0.0 to force results** — This returns all documents regardless of relevance, defeating the purpose of hybrid search and polluting the context with irrelevant documents. (80% fail)
- **Switching to only dense search** — This eliminates the sparse component, which may be crucial for keyword-based retrieval in domains like legal or medical where specific terms matter. (65% fail)
- **Increasing the alpha weight for the sparse component to 0.9** — This does not solve the scale mismatch; it only shifts dominance. The scores are still on different scales, so the fusion remains skewed. (70% fail)
