huggingface config_error ai_generated true

RuntimeError:使用 Trainer 时,模型已通过 device_map='auto' 加载,不支持 device_map='auto'。请设置 device_map=None 或在单个设备上加载模型。

RuntimeError: device_map='auto' is not supported when using Trainer with a model that has been loaded with device_map='auto'. Please set device_map=None or load the model on a single device.

ID: huggingface/device-map-auto-conflict-with-trainer

其他格式: JSON · Markdown 中文 · English
90%修复率
85%置信度
1证据数
2024-02-20首次发现

版本兼容性

版本状态引入弃用备注
transformers 4.42.0 active
accelerate 0.28.0 active
torch 2.2.0 active

根因分析

Trainer 内部管理设备分配,与 Accelerate 的 `device_map='auto'` 设置的模型并行冲突,导致运行时断言失败。

English

Trainer internally manages device placement and conflicts with model parallelism set by `device_map='auto'` from Accelerate, causing a runtime assertion failure.

generic

官方文档

https://huggingface.co/docs/transformers/en/troubleshooting#device-map-issues

解决方案

  1. Load the model without device_map: `model = AutoModelForCausalLM.from_pretrained('model-name', device_map=None)` and then pass to Trainer.
  2. Use `accelerate launch` with a config file to manage multi-GPU, and set `device_map=None` in code.

无效尝试

常见但无效的做法:

  1. 100% 失败

    Trainer does not accept `device_map` parameter; it relies on model's existing device map, causing the same conflict.

  2. 80% 失败

    DataParallel is incompatible with Trainer's internal gradient accumulation and loss scaling, leading to silent accuracy drop or hang.