huggingface
config_error
ai_generated
true
RuntimeError: device_map='auto' is not supported when using Trainer with a model that has been loaded with device_map='auto'. Please set device_map=None or load the model on a single device.
ID: huggingface/device-map-auto-conflict-with-trainer
90%Fix Rate
85%Confidence
1Evidence
2024-02-20First Seen
Version Compatibility
| Version | Status | Introduced | Deprecated | Notes |
|---|---|---|---|---|
| transformers 4.42.0 | active | — | — | — |
| accelerate 0.28.0 | active | — | — | — |
| torch 2.2.0 | active | — | — | — |
Root Cause
Trainer internally manages device placement and conflicts with model parallelism set by `device_map='auto'` from Accelerate, causing a runtime assertion failure.
generic中文
Trainer 内部管理设备分配,与 Accelerate 的 `device_map='auto'` 设置的模型并行冲突,导致运行时断言失败。
Official Documentation
https://huggingface.co/docs/transformers/en/troubleshooting#device-map-issuesWorkarounds
-
90% success Load the model without device_map: `model = AutoModelForCausalLM.from_pretrained('model-name', device_map=None)` and then pass to Trainer.
Load the model without device_map: `model = AutoModelForCausalLM.from_pretrained('model-name', device_map=None)` and then pass to Trainer. -
85% success Use `accelerate launch` with a config file to manage multi-GPU, and set `device_map=None` in code.
Use `accelerate launch` with a config file to manage multi-GPU, and set `device_map=None` in code.
中文步骤
Load the model without device_map: `model = AutoModelForCausalLM.from_pretrained('model-name', device_map=None)` and then pass to Trainer.Use `accelerate launch` with a config file to manage multi-GPU, and set `device_map=None` in code.
Dead Ends
Common approaches that don't work:
-
100% fail
Trainer does not accept `device_map` parameter; it relies on model's existing device map, causing the same conflict.
-
80% fail
DataParallel is incompatible with Trainer's internal gradient accumulation and loss scaling, leading to silent accuracy drop or hang.