llm data_error ai_generated true

RAG hybrid search returns no results because dense and sparse scores are on different scales

ID: llm/rag-hybrid-search-score-mismatch

Also available as: JSON · Markdown · 中文
80%Fix Rate
87%Confidence
1Evidence
2024-05-22First Seen

Version Compatibility

VersionStatusIntroducedDeprecatedNotes
langchain==0.2.5 active
chromadb==0.5.0 active
qdrant-client==1.9.0 active
elasticsearch==8.13.0 active

Root Cause

When using hybrid search (dense + sparse), the dense embedding similarity scores (e.g., cosine similarity 0.0-1.0) and sparse BM25/SPLADE scores (e.g., 0-20) are not normalized to the same range before fusion, causing one component to dominate or return no results after thresholding.

generic

中文

当使用混合搜索(稠密 + 稀疏)时,稠密嵌入相似度分数(例如,余弦相似度 0.0-1.0)和稀疏 BM25/SPLADE 分数(例如,0-20)在融合前未归一化到同一范围,导致一个组件占主导地位或在阈值处理后返回空结果。

Official Documentation

https://qdrant.tech/documentation/concepts/hybrid-queries/#score-normalization

Workarounds

  1. 90% success Normalize both score sets to [0,1] using min-max scaling before fusion. Example: def normalize_scores(scores): min_s, max_s = min(scores), max(scores) if max_s == min_s: return [0.0] * len(scores) return [(s - min_s) / (max_s - min_s) for s in scores] dense_scores = normalize_scores(dense_scores) sparse_scores = normalize_scores(sparse_scores) hybrid_scores = [alpha * d + (1-alpha) * s for d, s in zip(dense_scores, sparse_scores)]
    Normalize both score sets to [0,1] using min-max scaling before fusion. Example:
    
    def normalize_scores(scores):
        min_s, max_s = min(scores), max(scores)
        if max_s == min_s:
            return [0.0] * len(scores)
        return [(s - min_s) / (max_s - min_s) for s in scores]
    
    dense_scores = normalize_scores(dense_scores)
    sparse_scores = normalize_scores(sparse_scores)
    hybrid_scores = [alpha * d + (1-alpha) * s for d, s in zip(dense_scores, sparse_scores)]
  2. 95% success Use a rank-based fusion (RRF) instead of score-based fusion. RRF combines ranks directly, avoiding scale issues entirely. Example: def reciprocal_rank_fusion(dense_ranks, sparse_ranks, k=60): scores = {} for rank, doc_id in enumerate(dense_ranks): scores[doc_id] = 1 / (k + rank + 1) for rank, doc_id in enumerate(sparse_ranks): scores[doc_id] = scores.get(doc_id, 0) + 1 / (k + rank + 1) return sorted(scores.items(), key=lambda x: x[1], reverse=True)
    Use a rank-based fusion (RRF) instead of score-based fusion. RRF combines ranks directly, avoiding scale issues entirely. Example:
    
    def reciprocal_rank_fusion(dense_ranks, sparse_ranks, k=60):
        scores = {}
        for rank, doc_id in enumerate(dense_ranks):
            scores[doc_id] = 1 / (k + rank + 1)
        for rank, doc_id in enumerate(sparse_ranks):
            scores[doc_id] = scores.get(doc_id, 0) + 1 / (k + rank + 1)
        return sorted(scores.items(), key=lambda x: x[1], reverse=True)
  3. 82% success Configure the vector database to use built-in score normalization. For Qdrant, set `quantization_config=models.ScalarQuantization(scalar=models.ScalarQuantizationConfig(type=models.ScalarType.INT8))` which normalizes scores internally.
    Configure the vector database to use built-in score normalization. For Qdrant, set `quantization_config=models.ScalarQuantization(scalar=models.ScalarQuantizationConfig(type=models.ScalarType.INT8))` which normalizes scores internally.

中文步骤

  1. 在融合前使用最小-最大缩放将两组分数归一化到 [0,1]。示例:
    
    def normalize_scores(scores):
        min_s, max_s = min(scores), max(scores)
        if max_s == min_s:
            return [0.0] * len(scores)
        return [(s - min_s) / (max_s - min_s) for s in scores]
    
    dense_scores = normalize_scores(dense_scores)
    sparse_scores = normalize_scores(sparse_scores)
    hybrid_scores = [alpha * d + (1-alpha) * s for d, s in zip(dense_scores, sparse_scores)]
  2. 使用基于排名的融合(RRF)而不是基于分数的融合。RRF 直接合并排名,完全避免尺度问题。示例:
    
    def reciprocal_rank_fusion(dense_ranks, sparse_ranks, k=60):
        scores = {}
        for rank, doc_id in enumerate(dense_ranks):
            scores[doc_id] = 1 / (k + rank + 1)
        for rank, doc_id in enumerate(sparse_ranks):
            scores[doc_id] = scores.get(doc_id, 0) + 1 / (k + rank + 1)
        return sorted(scores.items(), key=lambda x: x[1], reverse=True)
  3. 配置向量数据库使用内置分数归一化。对于 Qdrant,设置 `quantization_config=models.ScalarQuantization(scalar=models.ScalarQuantizationConfig(type=models.ScalarType.INT8))`,这会在内部归一化分数。

Dead Ends

Common approaches that don't work:

  1. Lowering the similarity threshold to 0.0 to force results 80% fail

    This returns all documents regardless of relevance, defeating the purpose of hybrid search and polluting the context with irrelevant documents.

  2. Switching to only dense search 65% fail

    This eliminates the sparse component, which may be crucial for keyword-based retrieval in domains like legal or medical where specific terms matter.

  3. Increasing the alpha weight for the sparse component to 0.9 70% fail

    This does not solve the scale mismatch; it only shifts dominance. The scores are still on different scales, so the fusion remains skewed.