CUDNN_STATUS_NOT_SUPPORTED (5) cuda runtime_error ai_generated partial

RuntimeError: cuDNN error: CUDNN_STATUS_NOT_SUPPORTED when calling cudnnRNNBackwardData_v8 with training mode enabled and double backward

ID: cuda/cudnn-rnn-double-backward

Also available as: JSON · Markdown · 中文

78%Fix Rate

82%Confidence

1Evidence

2023-10-25First Seen

Version Compatibility

Version	Status	Introduced	Deprecated	Notes
cuDNN 8.9.0	active	—	—	—
cuDNN 8.9.5	active	—	—	—
PyTorch 2.1.0	active	—	—	—
PyTorch 2.2.0	active	—	—	—

Root Cause

cuDNN RNN backward operations (especially backward data with double backward) are not supported for certain RNN modes (e.g., LSTM with projection) or when the input tensor requires grad and the graph is retained; cuDNN v8 restricts double backward support to specific configurations.

generic

中文

cuDNN RNN 反向传播操作（特别是反向数据与双重反向传播）在特定 RNN 模式（如带投影的 LSTM）下不受支持，或者当输入张量需要梯度且计算图被保留时；cuDNN v8 将双重反向传播支持限制为特定配置。

Official Documentation

https://docs.nvidia.com/deeplearning/cudnn/api/index.html#cudnnRNNBackwardData

Workarounds

85% success Switch to a non-projected LSTM (e.g., remove projection layer) or use GRU instead, which has broader double backward support. Example: change nn.LSTM(input_size, hidden_size, proj_size=hidden_size) to nn.LSTM(input_size, hidden_size).
```
Switch to a non-projected LSTM (e.g., remove projection layer) or use GRU instead, which has broader double backward support. Example: change nn.LSTM(input_size, hidden_size, proj_size=hidden_size) to nn.LSTM(input_size, hidden_size).
```
75% success Use torch.autograd.grad with create_graph=False for the backward pass, and manually implement double backward using torch.autograd.Function with a custom backward that does not rely on cuDNN RNN backward data.
```
Use torch.autograd.grad with create_graph=False for the backward pass, and manually implement double backward using torch.autograd.Function with a custom backward that does not rely on cuDNN RNN backward data.
```

中文步骤

Switch to a non-projected LSTM (e.g., remove projection layer) or use GRU instead, which has broader double backward support. Example: change nn.LSTM(input_size, hidden_size, proj_size=hidden_size) to nn.LSTM(input_size, hidden_size).

Use torch.autograd.grad with create_graph=False for the backward pass, and manually implement double backward using torch.autograd.Function with a custom backward that does not rely on cuDNN RNN backward data.

Dead Ends

Common approaches that don't work:

80% fail
Increasing cuDNN version does not add double backward support for all RNN modes; the limitation is architectural in cuDNN v8.
70% fail
Setting torch.backends.cudnn.enabled=False forces a fallback to non-cuDNN RNN but may cause performance regression or different numerical behavior; double backward still fails if the custom RNN does not support it.
90% fail
Using retain_graph=True without detaching intermediate activations does not prevent the error; the double backward path still triggers the unsupported cuDNN routine.