huggingface
config_error
ai_generated
true
RuntimeError:使用 Trainer 时,模型已通过 device_map='auto' 加载,不支持 device_map='auto'。请设置 device_map=None 或在单个设备上加载模型。
RuntimeError: device_map='auto' is not supported when using Trainer with a model that has been loaded with device_map='auto'. Please set device_map=None or load the model on a single device.
ID: huggingface/device-map-auto-conflict-with-trainer
90%修复率
85%置信度
1证据数
2024-02-20首次发现
版本兼容性
| 版本 | 状态 | 引入 | 弃用 | 备注 |
|---|---|---|---|---|
| transformers 4.42.0 | active | — | — | — |
| accelerate 0.28.0 | active | — | — | — |
| torch 2.2.0 | active | — | — | — |
根因分析
Trainer 内部管理设备分配,与 Accelerate 的 `device_map='auto'` 设置的模型并行冲突,导致运行时断言失败。
English
Trainer internally manages device placement and conflicts with model parallelism set by `device_map='auto'` from Accelerate, causing a runtime assertion failure.
官方文档
https://huggingface.co/docs/transformers/en/troubleshooting#device-map-issues解决方案
-
Load the model without device_map: `model = AutoModelForCausalLM.from_pretrained('model-name', device_map=None)` and then pass to Trainer. -
Use `accelerate launch` with a config file to manage multi-GPU, and set `device_map=None` in code.
无效尝试
常见但无效的做法:
-
100% 失败
Trainer does not accept `device_map` parameter; it relies on model's existing device map, causing the same conflict.
-
80% 失败
DataParallel is incompatible with Trainer's internal gradient accumulation and loss scaling, leading to silent accuracy drop or hang.