CUDNN_STATUS_BAD_PARAM cuda runtime_error ai_generated true

RuntimeError: cuDNN error: CUDNN_STATUS_BAD_PARAM when calling cudnnSetRNNDescriptor_v8

ID: cuda/cudnn-rnn-hidden-size-mismatch

Also available as: JSON · Markdown · 中文
85%Fix Rate
88%Confidence
1Evidence
2023-06-20First Seen

Version Compatibility

VersionStatusIntroducedDeprecatedNotes
cuDNN 8.9.0 active
cuDNN 8.9.5 active
PyTorch 2.1.0 active
TensorFlow 2.14 active

Root Cause

The hidden size provided to an RNN/LSTM/GRU layer is not a multiple of 32 or 64 (depending on cuDNN version and RNN mode), violating cuDNN's alignment requirement for performance kernels, or the number of layers is zero.

generic

中文

提供给 RNN/LSTM/GRU 层的隐藏层大小不是 32 或 64 的倍数(取决于 cuDNN 版本和 RNN 模式),违反了 cuDNN 性能内核的对齐要求,或层数为零。

Official Documentation

https://docs.nvidia.com/deeplearning/cudnn/api/index.html#cudnnSetRNNDescriptor

Workarounds

  1. 90% success Set the hidden size to a multiple of 64 (or 32 for some cuDNN versions). For example, if hidden_size=100, change to 128. In PyTorch: `nn.LSTM(input_size, hidden_size=128, num_layers=2)`. Verify by checking `hidden_size % 64 == 0`.
    Set the hidden size to a multiple of 64 (or 32 for some cuDNN versions). For example, if hidden_size=100, change to 128. In PyTorch: `nn.LSTM(input_size, hidden_size=128, num_layers=2)`. Verify by checking `hidden_size % 64 == 0`.
  2. 70% success If you must keep an arbitrary hidden size, use `torch.backends.cudnn.rnn.allow_tf32 = False` and set `torch.backends.cudnn.deterministic = True` to force a fallback implementation that may not enforce alignment (performance penalty).
    If you must keep an arbitrary hidden size, use `torch.backends.cudnn.rnn.allow_tf32 = False` and set `torch.backends.cudnn.deterministic = True` to force a fallback implementation that may not enforce alignment (performance penalty).
  3. 80% success Explicitly pad the hidden state tensor to the next multiple of 64 using `torch.nn.functional.pad` before passing to the RNN, then slice the output back to the original size.
    Explicitly pad the hidden state tensor to the next multiple of 64 using `torch.nn.functional.pad` before passing to the RNN, then slice the output back to the original size.

中文步骤

  1. 将隐藏层大小设置为 64 的倍数(某些 cuDNN 版本为 32)。例如,如果 hidden_size=100,改为 128。在 PyTorch 中:`nn.LSTM(input_size, hidden_size=128, num_layers=2)`。通过检查 `hidden_size % 64 == 0` 验证。
  2. 如果必须保留任意隐藏层大小,设置 `torch.backends.cudnn.rnn.allow_tf32 = False` 和 `torch.backends.cudnn.deterministic = True` 强制回退到可能不强制对齐的实现(性能损失)。
  3. 在传递给 RNN 之前,使用 `torch.nn.functional.pad` 将隐藏状态张量显式填充到下一个 64 的倍数,然后将输出切片回原始大小。

Dead Ends

Common approaches that don't work:

  1. Setting `torch.backends.cudnn.enabled = False` to disable cuDNN 70% fail

    Disabling cuDNN may fall back to a non-cuDNN RNN implementation that still validates hidden size; also significantly degrades performance.

  2. Reducing the number of RNN layers arbitrarily 90% fail

    The error is about hidden size alignment, not layer count; reducing layers only helps if num_layers was zero, which is rare.

  3. Switching to a different RNN cell type (e.g., LSTM to GRU) without changing hidden size 85% fail

    The alignment requirement applies to all cuDNN RNN cells; the error persists if hidden size is not a multiple of the alignment.