# CUDA 错误：驱动程序处于对请求操作无效的状态 (cudaErrorDriverNotReady)

- **ID:** `cuda/cuda-error-driver-unloading`
- **领域:** cuda
- **类别:** runtime_error
- **错误码:** `cudaErrorDriverNotReady (804)`
- **验证级别:** ai_generated
- **修复率:** 78%

## 根因

CUDA 驱动程序正在被卸载或已部分卸载，这是由于多线程应用程序关闭时的竞态条件，通常是在其他线程仍持有 CUDA 上下文时调用 cudaDeviceReset() 导致的。

## 版本兼容性

| 版本 | 状态 | 引入 | 弃用 |
|------|------|------|------|
| CUDA 11.8 | active | — | — |
| CUDA 12.0 | active | — | — |
| CUDA 12.1 | active | — | — |
| CUDA 12.2 | active | — | — |
| PyTorch 2.1.0 | active | — | — |
| PyTorch 2.2.0 | active | — | — |

## 解决方案

1. ```
   Ensure all CUDA contexts are destroyed before calling cudaDeviceReset() by using a thread-safe reference counter. For example, in Python with PyTorch: `import torch; torch.cuda.synchronize(); torch.cuda.empty_cache(); torch.cuda.reset_peak_memory_stats(); del model; torch.cuda.reset_max_memory_cached()`
   ```
2. ```
   Avoid calling cudaDeviceReset() in multi-threaded environments; instead, rely on the driver to clean up contexts at process exit. In C++, remove explicit `cudaDeviceReset()` calls from destructors or atexit handlers.
   ```
3. ```
   Use a try-catch around the reset call and ignore the error if it occurs during shutdown: `try { cudaDeviceReset(); } catch (const std::exception&) { /* ignore during shutdown */ }`
   ```

## 无效尝试

- **** — The error occurs during shutdown, so restarting only delays the issue; the race condition persists on subsequent shutdowns. (70% 失败率)
- **** — Synchronization does not guarantee that all threads have released their contexts; the driver may still be in an invalid state if other threads are mid-operation. (60% 失败率)
