huggingface runtime_error ai_generated partial

RuntimeError: Placeholder storage has not been allocated on MPS device

ID: huggingface/mps-fp16-cast-error

Also available as: JSON · Markdown · 中文
80%Fix Rate
85%Confidence
1Evidence
2023-06-15First Seen

Version Compatibility

VersionStatusIntroducedDeprecatedNotes
transformers>=4.30.0 active
torch>=2.0.0 active
macOS>=13.0 active

Root Cause

Model weights or tensors are being cast to float16 on Apple MPS backend which does not fully support FP16, causing allocation failure.

generic

中文

模型权重或张量在Apple MPS后端上被转换为float16,但MPS不完全支持FP16,导致分配失败。

Official Documentation

https://pytorch.org/docs/stable/notes/mps.html

Workarounds

  1. 85% success Load model with torch_dtype=torch.float32 and use model.to('mps') without half precision.
    Load model with torch_dtype=torch.float32 and use model.to('mps') without half precision.
  2. 70% success Set environment variable PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable MPS memory allocation optimization.
    Set environment variable PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable MPS memory allocation optimization.
  3. 95% success Use CPU fallback by setting device='cpu' and use float16 with CPU if memory is a concern.
    Use CPU fallback by setting device='cpu' and use float16 with CPU if memory is a concern.

中文步骤

  1. Load model with torch_dtype=torch.float32 and use model.to('mps') without half precision.
  2. Set environment variable PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable MPS memory allocation optimization.
  3. Use CPU fallback by setting device='cpu' and use float16 with CPU if memory is a concern.

Dead Ends

Common approaches that don't work:

  1. 95% fail

    MPS backend has limited FP16 support; explicit FP16 casting triggers the error.

  2. 90% fail

    Half-precision conversion on MPS is not fully implemented and causes memory allocation errors.