cudaErrorInvalidPc (805) cuda runtime_error ai_generated partial

CUDA 错误:无效的程序计数器 (cudaErrorInvalidPc)

CUDA error: invalid program counter (cudaErrorInvalidPc)

ID: cuda/cuda-error-invalid-pc

其他格式: JSON · Markdown 中文 · English
75%修复率
82%置信度
1证据数
2025-04-05首次发现

版本兼容性

版本状态引入弃用备注
CUDA 12.4 active
CUDA 12.6 active
NVIDIA Driver 550.54.10 active
NVIDIA Driver 560.35.03 active
PyTorch 2.5.0 active

根因分析

GPU 尝试执行具有无效程序计数器的内核,通常是由于损坏的设备函数指针、编译错误的内核或设备代码中的越界跳转(例如,误用函数指针或间接调用)。

English

The GPU attempted to execute a kernel with an invalid program counter, typically due to a corrupted device function pointer, a miscompiled kernel, or an out-of-bounds jump in device code (e.g., from a misused function pointer or indirect call).

generic

官方文档

https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html

解决方案

  1. Compile the kernel with `-lineinfo` and run with `cuda-memcheck` or `compute-sanitizer` to identify the exact source line causing the invalid jump: `compute-sanitizer --tool memcheck ./my_app`
  2. Avoid using function pointers in device code if possible; replace them with switch statements or templates to eliminate indirect jumps.
  3. Ensure all device function pointers are initialized correctly and not left as null or garbage. For example, in CUDA C++: `typedef void (*func_t)(); func_t f = &my_device_func;`

无效尝试

常见但无效的做法:

  1. 80% 失败

    The error occurs during kernel execution, not during API calls; post-hoc checks do not prevent the invalid program counter from being reached.

  2. 90% 失败

    Scheduling policy does not affect kernel correctness; the invalid PC is a code bug, not a synchronization issue.