CUDNN_STATUS_NOT_SUPPORTED (5) cuda runtime_error ai_generated partial

RuntimeError: 在训练模式下调用 cudnnRNNBackwardData_v8 并启用双重反向传播时出现 cuDNN 错误：CUDNN_STATUS_NOT_SUPPORTED

RuntimeError: cuDNN error: CUDNN_STATUS_NOT_SUPPORTED when calling cudnnRNNBackwardData_v8 with training mode enabled and double backward

ID: cuda/cudnn-rnn-double-backward

其他格式: JSON · Markdown 中文 · English

78%修复率

82%置信度

1证据数

2023-10-25首次发现

版本兼容性

版本	状态	引入	弃用	备注
cuDNN 8.9.0	active	—	—	—
cuDNN 8.9.5	active	—	—	—
PyTorch 2.1.0	active	—	—	—
PyTorch 2.2.0	active	—	—	—

根因分析

cuDNN RNN 反向传播操作（特别是反向数据与双重反向传播）在特定 RNN 模式（如带投影的 LSTM）下不受支持，或者当输入张量需要梯度且计算图被保留时；cuDNN v8 将双重反向传播支持限制为特定配置。

English

cuDNN RNN backward operations (especially backward data with double backward) are not supported for certain RNN modes (e.g., LSTM with projection) or when the input tensor requires grad and the graph is retained; cuDNN v8 restricts double backward support to specific configurations.

generic

官方文档

https://docs.nvidia.com/deeplearning/cudnn/api/index.html#cudnnRNNBackwardData

解决方案

Switch to a non-projected LSTM (e.g., remove projection layer) or use GRU instead, which has broader double backward support. Example: change nn.LSTM(input_size, hidden_size, proj_size=hidden_size) to nn.LSTM(input_size, hidden_size).

Use torch.autograd.grad with create_graph=False for the backward pass, and manually implement double backward using torch.autograd.Function with a custom backward that does not rely on cuDNN RNN backward data.

无效尝试

常见但无效的做法:

80% 失败
Increasing cuDNN version does not add double backward support for all RNN modes; the limitation is architectural in cuDNN v8.
70% 失败
Setting torch.backends.cudnn.enabled=False forces a fallback to non-cuDNN RNN but may cause performance regression or different numerical behavior; double backward still fails if the custom RNN does not support it.
90% 失败
Using retain_graph=True without detaching intermediate activations does not prevent the error; the double backward path still triggers the unsupported cuDNN routine.