# CUDA error: MPS heap memory limit exceeded (cudaErrorMpsHeapMemoryLimitExceeded)

- **ID:** `cuda/mps-heap-limit-exceeded`
- **Domain:** cuda
- **Category:** resource_error
- **Error Code:** `cudaErrorMpsHeapMemoryLimitExceeded`
- **Verification:** ai_generated
- **Fix Rate:** 80%

## Root Cause

Under NVIDIA Multi-Process Service (MPS), the per-client heap memory limit set by the MPS server (via CUDA_MPS_HEAP_SIZE) has been exhausted by the current process, typically due to allocating too many small tensors or not freeing memory in a long-running training loop.

## Version Compatibility

| Version | Status | Introduced | Deprecated |
|---------|--------|------------|------------|
| CUDA 11.8 | active | — | — |
| CUDA 12.2 | active | — | — |
| MPS 1.0 | active | — | — |
| NVIDIA Driver 535.54 | active | — | — |

## Workarounds

1. **Increase the MPS heap size by setting the environment variable before starting the MPS daemon: `export CUDA_MPS_HEAP_SIZE=4G` (or a larger value like `8G`), then restart MPS with `nvidia-cuda-mps-control -d`. This allocates more heap memory per client.** (90% success)
   ```
   Increase the MPS heap size by setting the environment variable before starting the MPS daemon: `export CUDA_MPS_HEAP_SIZE=4G` (or a larger value like `8G`), then restart MPS with `nvidia-cuda-mps-control -d`. This allocates more heap memory per client.
   ```
2. **Reduce memory fragmentation by using `torch.cuda.empty_cache()` periodically in your training loop, or by reusing tensors with `torch.zeros` or `torch.empty` instead of creating new ones each iteration.** (75% success)
   ```
   Reduce memory fragmentation by using `torch.cuda.empty_cache()` periodically in your training loop, or by reusing tensors with `torch.zeros` or `torch.empty` instead of creating new ones each iteration.
   ```
3. **Switch from MPS to a single process per GPU (disable MPS) by stopping the MPS daemon: `echo quit | nvidia-cuda-mps-control`. This removes the heap limit entirely but loses MPS's inter-process communication benefits.** (95% success)
   ```
   Switch from MPS to a single process per GPU (disable MPS) by stopping the MPS daemon: `echo quit | nvidia-cuda-mps-control`. This removes the heap limit entirely but loses MPS's inter-process communication benefits.
   ```

## Dead Ends

- **Increasing `torch.cuda.max_memory_allocated` via `torch.cuda.set_per_process_memory_fraction`** — The MPS heap limit is independent of the per-process memory fraction; changing the PyTorch memory limit does not affect the MPS server's heap allocation. (90% fail)
- **Restarting only the CUDA process without restarting the MPS server** — The MPS server's heap limit is persistent across client restarts; the limit is still in effect unless the server is restarted. (80% fail)
- **Setting `CUDA_MPS_HEAP_SIZE=0` to disable the limit** — Setting heap size to 0 may cause undefined behavior or default to a very small limit; the environment variable must be set to a positive value or unset to use the default (which is usually larger). (75% fail)
