# torch.cuda.OutOfMemoryError: CUDA 内存不足。尝试分配 2.00 GiB。GPU 0 总容量 8.00 GiB；已分配 7.80 GiB。

- **ID:** `llm/huggingface-model-load-oom-on-cpu`
- **领域:** llm
- **类别:** resource_error
- **错误码:** `CUDA-OOM-001`
- **验证级别:** ai_generated
- **修复率:** 88%

## 根因

Hugging Face 模型加载尝试在 GPU 上分配完整模型，但由于其他进程（例如，先前的模型实例、数据加载器）消耗了内存，或者模型本身对于 GPU 来说太大，导致可用 VRAM 不足。

## 版本兼容性

| 版本 | 状态 | 引入 | 弃用 |
|------|------|------|------|
| transformers==4.36.0 | active | — | — |
| torch==2.1.0 | active | — | — |
| accelerate==0.25.0 | active | — | — |

## 解决方案

1. ```
   使用 device_map='auto' 加载模型并卸载到 CPU 或磁盘：`model = AutoModelForCausalLM.from_pretrained('model-name', device_map='auto', torch_dtype=torch.float16, offload_folder='/tmp/offload')`。这会根据需要将模型拆分到 GPU、CPU 和磁盘。
   ```
2. ```
   在训练前使用梯度检查点以减少内存：`model.gradient_checkpointing_enable()`，通过重新计算激活值来用计算换取内存。
   ```
3. ```
   在加载模型前显式清除 GPU 内存：`import gc; gc.collect(); torch.cuda.empty_cache(); torch.cuda.reset_peak_memory_stats()`，然后使用 `low_cpu_mem_usage=True` 加载模型。
   ```

## 无效尝试

- **** — While this frees GPU memory from the current session, it doesn't prevent the underlying memory fragmentation or model size issue. The error returns if the model is loaded again without adjustments. (60% 失败率)
- **** — empty_cache() only releases unused cached memory allocator blocks, not memory actively held by other tensors. It often has minimal effect when VRAM is fully consumed by model parameters. (80% 失败率)
- **** — The OOM occurs during model loading, not inference. Batch size doesn't affect model parameter memory allocation. (95% 失败率)