cudaErrorInvalidPc (805)
cuda
runtime_error
ai_generated
partial
CUDA 错误:无效的程序计数器 (cudaErrorInvalidPc)
CUDA error: invalid program counter (cudaErrorInvalidPc)
ID: cuda/cuda-error-invalid-pc
75%修复率
82%置信度
1证据数
2025-04-05首次发现
版本兼容性
| 版本 | 状态 | 引入 | 弃用 | 备注 |
|---|---|---|---|---|
| CUDA 12.4 | active | — | — | — |
| CUDA 12.6 | active | — | — | — |
| NVIDIA Driver 550.54.10 | active | — | — | — |
| NVIDIA Driver 560.35.03 | active | — | — | — |
| PyTorch 2.5.0 | active | — | — | — |
根因分析
GPU 尝试执行具有无效程序计数器的内核,通常是由于损坏的设备函数指针、编译错误的内核或设备代码中的越界跳转(例如,误用函数指针或间接调用)。
English
The GPU attempted to execute a kernel with an invalid program counter, typically due to a corrupted device function pointer, a miscompiled kernel, or an out-of-bounds jump in device code (e.g., from a misused function pointer or indirect call).
官方文档
https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html解决方案
-
Compile the kernel with `-lineinfo` and run with `cuda-memcheck` or `compute-sanitizer` to identify the exact source line causing the invalid jump: `compute-sanitizer --tool memcheck ./my_app`
-
Avoid using function pointers in device code if possible; replace them with switch statements or templates to eliminate indirect jumps.
-
Ensure all device function pointers are initialized correctly and not left as null or garbage. For example, in CUDA C++: `typedef void (*func_t)(); func_t f = &my_device_func;`
无效尝试
常见但无效的做法:
-
80% 失败
The error occurs during kernel execution, not during API calls; post-hoc checks do not prevent the invalid program counter from being reached.
-
90% 失败
Scheduling policy does not affect kernel correctness; the invalid PC is a code bug, not a synchronization issue.