huggingface config_error ai_generated true

ValueError: You passed `quantization_config` with `bnb_4bit_compute_dtype=torch.float16` but the model weights are loaded in torch.float32. This may cause unexpected behavior.

ID: huggingface/quantization-config-bnb-compute-dtype-mismatch

Also available as: JSON · Markdown · 中文

85%Fix Rate

83%Confidence

1Evidence

2024-03-12First Seen

Version Compatibility

Version	Status	Introduced	Deprecated	Notes
transformers>=4.36.0	active	—	—	—
bitsandbytes>=0.41.0	active	—	—	—
torch>=2.0.0	active	—	—	—

Root Cause

Mismatch between the compute dtype specified in quantization config and the actual dtype of the loaded model weights, causing precision inconsistencies.

generic

中文

量化配置中指定的计算 dtype 与实际加载的模型权重的 dtype 不匹配，导致精度不一致。

Official Documentation

https://huggingface.co/docs/transformers/main/en/quantization#bitsandbytes

Workarounds

90% success Set torch_dtype='auto' in from_pretrained to match the quantization config: model = AutoModel.from_pretrained('model', quantization_config=quant_config, torch_dtype='auto')
```
Set torch_dtype='auto' in from_pretrained to match the quantization config: model = AutoModel.from_pretrained('model', quantization_config=quant_config, torch_dtype='auto')
```
85% success Explicitly set bnb_4bit_compute_dtype to match the model's dtype: quant_config = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_compute_dtype=torch.float32)
```
Explicitly set bnb_4bit_compute_dtype to match the model's dtype: quant_config = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_compute_dtype=torch.float32)
```

中文步骤

在 from_pretrained 中设置 torch_dtype='auto' 以匹配量化配置：model = AutoModel.from_pretrained('model', quantization_config=quant_config, torch_dtype='auto')

明确设置 bnb_4bit_compute_dtype 以匹配模型的 dtype：quant_config = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_compute_dtype=torch.float32)

Dead Ends

Common approaches that don't work:

70% fail
If quantization config specifies float16 but model weights are float32, the error persists.
90% fail
The mismatch causes incorrect computations, especially in mixed-precision training.