CUDNN_STATUS_BAD_PARAM pytorch runtime_error ai_generated true

RuntimeError: cuDNN error: CUDNN_STATUS_BAD_PARAM when calling cudnnSetRNNDescriptor_v8

ID: pytorch/cudnn-benchmark-algo-failure

Also available as: JSON · Markdown · 中文
80%Fix Rate
85%Confidence
1Evidence
2023-03-15First Seen

Version Compatibility

VersionStatusIntroducedDeprecatedNotes
PyTorch 2.0.0 active
PyTorch 2.1.0 active
cuDNN 8.9.0 active
CUDA 11.8 active

Root Cause

torch.backends.cudnn.benchmark enabled with an RNN that uses a non-default hidden size or num_layers that cuDNN does not support in its heuristic search, causing invalid parameters in RNN descriptor initialization.

generic

中文

在启用 torch.backends.cudnn.benchmark 的情况下,RNN 使用了 cuDNN 启发式搜索不支持的隐藏层大小或层数,导致 RNN 描述符初始化参数无效。

Official Documentation

https://pytorch.org/docs/stable/notes/cuda.html#cudnn-benchmark

Workarounds

  1. 75% success Disable cuDNN benchmark: torch.backends.cudnn.benchmark = False. Also, set torch.backends.cudnn.enabled = False as a fallback if the error persists.
    Disable cuDNN benchmark: torch.backends.cudnn.benchmark = False. Also, set torch.backends.cudnn.enabled = False as a fallback if the error persists.
  2. 85% success Use a hidden size divisible by 64 (e.g., 256 instead of 250) and ensure num_layers <= 8. Example: model = nn.LSTM(input_size=128, hidden_size=256, num_layers=2, batch_first=True).
    Use a hidden size divisible by 64 (e.g., 256 instead of 250) and ensure num_layers <= 8. Example: model = nn.LSTM(input_size=128, hidden_size=256, num_layers=2, batch_first=True).
  3. 70% success Switch to PyTorch's native RNN implementation by setting torch.backends.cudnn.enabled = False before model creation. This forces the use of non-cuDNN kernels.
    Switch to PyTorch's native RNN implementation by setting torch.backends.cudnn.enabled = False before model creation. This forces the use of non-cuDNN kernels.

中文步骤

  1. 禁用 cuDNN 基准测试:torch.backends.cudnn.benchmark = False。如果错误仍然存在,可进一步设置 torch.backends.cudnn.enabled = False。
  2. 使用能被 64 整除的隐藏层大小(如 256 而非 250),并确保 num_layers <= 8。示例:model = nn.LSTM(input_size=128, hidden_size=256, num_layers=2, batch_first=True)。
  3. 在模型创建前设置 torch.backends.cudnn.enabled = False,强制使用 PyTorch 原生 RNN 实现,避免 cuDNN 内核。

Dead Ends

Common approaches that don't work:

  1. 60% fail

    Upgrading PyTorch alone does not fix the cuDNN parameter validation; the root cause is the RNN configuration, not the library version.

  2. 70% fail

    Setting torch.backends.cudnn.deterministic = True does not prevent the benchmark from running the failing heuristic; it only affects algorithm selection.

  3. 50% fail

    Changing batch size does not affect the RNN descriptor parameters (hidden_size, num_layers), so the error persists.