huggingface config_error ai_generated true

ValueError: You passed `quantization_config` with `bnb_4bit_compute_dtype=torch.float16` but the model weights are loaded in torch.float32. This may cause unexpected behavior.

ID: huggingface/quantization-config-bnb-compute-dtype-mismatch

Also available as: JSON · Markdown · 中文
85%Fix Rate
83%Confidence
1Evidence
2024-03-12First Seen

Version Compatibility

VersionStatusIntroducedDeprecatedNotes
transformers>=4.36.0 active
bitsandbytes>=0.41.0 active
torch>=2.0.0 active

Root Cause

Mismatch between the compute dtype specified in quantization config and the actual dtype of the loaded model weights, causing precision inconsistencies.

generic

中文

量化配置中指定的计算 dtype 与实际加载的模型权重的 dtype 不匹配,导致精度不一致。

Official Documentation

https://huggingface.co/docs/transformers/main/en/quantization#bitsandbytes

Workarounds

  1. 90% success Set torch_dtype='auto' in from_pretrained to match the quantization config: model = AutoModel.from_pretrained('model', quantization_config=quant_config, torch_dtype='auto')
    Set torch_dtype='auto' in from_pretrained to match the quantization config: model = AutoModel.from_pretrained('model', quantization_config=quant_config, torch_dtype='auto')
  2. 85% success Explicitly set bnb_4bit_compute_dtype to match the model's dtype: quant_config = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_compute_dtype=torch.float32)
    Explicitly set bnb_4bit_compute_dtype to match the model's dtype: quant_config = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_compute_dtype=torch.float32)

中文步骤

  1. 在 from_pretrained 中设置 torch_dtype='auto' 以匹配量化配置:model = AutoModel.from_pretrained('model', quantization_config=quant_config, torch_dtype='auto')
  2. 明确设置 bnb_4bit_compute_dtype 以匹配模型的 dtype:quant_config = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_compute_dtype=torch.float32)

Dead Ends

Common approaches that don't work:

  1. 70% fail

    If quantization config specifies float16 but model weights are float32, the error persists.

  2. 90% fail

    The mismatch causes incorrect computations, especially in mixed-precision training.