{
  "id": "cuda/cudnn-rnn-hidden-size-mismatch",
  "signature": "RuntimeError: cuDNN error: CUDNN_STATUS_BAD_PARAM when calling cudnnSetRNNDescriptor_v8",
  "signature_zh": "运行时错误：调用 cudnnSetRNNDescriptor_v8 时出现 CUDNN_STATUS_BAD_PARAM",
  "regex": "CUDNN_STATUS_BAD_PARAM when calling cudnnSetRNNDescriptor",
  "domain": "cuda",
  "category": "runtime_error",
  "subcategory": null,
  "root_cause": "The hidden size provided to an RNN/LSTM/GRU layer is not a multiple of 32 or 64 (depending on cuDNN version and RNN mode), violating cuDNN's alignment requirement for performance kernels, or the number of layers is zero.",
  "root_cause_type": "generic",
  "root_cause_zh": "提供给 RNN/LSTM/GRU 层的隐藏层大小不是 32 或 64 的倍数（取决于 cuDNN 版本和 RNN 模式），违反了 cuDNN 性能内核的对齐要求，或层数为零。",
  "versions": [
    {
      "version": "cuDNN 8.9.0",
      "introduced": null,
      "deprecated": null,
      "removed": null,
      "behavior_change": null,
      "status": "active"
    },
    {
      "version": "cuDNN 8.9.5",
      "introduced": null,
      "deprecated": null,
      "removed": null,
      "behavior_change": null,
      "status": "active"
    },
    {
      "version": "PyTorch 2.1.0",
      "introduced": null,
      "deprecated": null,
      "removed": null,
      "behavior_change": null,
      "status": "active"
    },
    {
      "version": "TensorFlow 2.14",
      "introduced": null,
      "deprecated": null,
      "removed": null,
      "behavior_change": null,
      "status": "active"
    }
  ],
  "os_specific": {},
  "dead_ends": [
    {
      "action": "Setting `torch.backends.cudnn.enabled = False` to disable cuDNN",
      "why_fails": "Disabling cuDNN may fall back to a non-cuDNN RNN implementation that still validates hidden size; also significantly degrades performance.",
      "fail_rate": 0.7,
      "condition": "",
      "sources": []
    },
    {
      "action": "Reducing the number of RNN layers arbitrarily",
      "why_fails": "The error is about hidden size alignment, not layer count; reducing layers only helps if num_layers was zero, which is rare.",
      "fail_rate": 0.9,
      "condition": "",
      "sources": []
    },
    {
      "action": "Switching to a different RNN cell type (e.g., LSTM to GRU) without changing hidden size",
      "why_fails": "The alignment requirement applies to all cuDNN RNN cells; the error persists if hidden size is not a multiple of the alignment.",
      "fail_rate": 0.85,
      "condition": "",
      "sources": []
    }
  ],
  "workarounds": [
    {
      "action": "Set the hidden size to a multiple of 64 (or 32 for some cuDNN versions). For example, if hidden_size=100, change to 128. In PyTorch: `nn.LSTM(input_size, hidden_size=128, num_layers=2)`. Verify by checking `hidden_size % 64 == 0`.",
      "success_rate": 0.9,
      "how": "Set the hidden size to a multiple of 64 (or 32 for some cuDNN versions). For example, if hidden_size=100, change to 128. In PyTorch: `nn.LSTM(input_size, hidden_size=128, num_layers=2)`. Verify by checking `hidden_size % 64 == 0`.",
      "condition": "",
      "sources": []
    },
    {
      "action": "If you must keep an arbitrary hidden size, use `torch.backends.cudnn.rnn.allow_tf32 = False` and set `torch.backends.cudnn.deterministic = True` to force a fallback implementation that may not enforce alignment (performance penalty).",
      "success_rate": 0.7,
      "how": "If you must keep an arbitrary hidden size, use `torch.backends.cudnn.rnn.allow_tf32 = False` and set `torch.backends.cudnn.deterministic = True` to force a fallback implementation that may not enforce alignment (performance penalty).",
      "condition": "",
      "sources": []
    },
    {
      "action": "Explicitly pad the hidden state tensor to the next multiple of 64 using `torch.nn.functional.pad` before passing to the RNN, then slice the output back to the original size.",
      "success_rate": 0.8,
      "how": "Explicitly pad the hidden state tensor to the next multiple of 64 using `torch.nn.functional.pad` before passing to the RNN, then slice the output back to the original size.",
      "condition": "",
      "sources": []
    }
  ],
  "workarounds_zh": [
    "将隐藏层大小设置为 64 的倍数（某些 cuDNN 版本为 32）。例如，如果 hidden_size=100，改为 128。在 PyTorch 中：`nn.LSTM(input_size, hidden_size=128, num_layers=2)`。通过检查 `hidden_size % 64 == 0` 验证。",
    "如果必须保留任意隐藏层大小，设置 `torch.backends.cudnn.rnn.allow_tf32 = False` 和 `torch.backends.cudnn.deterministic = True` 强制回退到可能不强制对齐的实现（性能损失）。",
    "在传递给 RNN 之前，使用 `torch.nn.functional.pad` 将隐藏状态张量显式填充到下一个 64 的倍数，然后将输出切片回原始大小。"
  ],
  "transition_graph": {
    "leads_to": [],
    "preceded_by": [],
    "frequently_confused_with": []
  },
  "official_doc_url": "https://docs.nvidia.com/deeplearning/cudnn/api/index.html#cudnnSetRNNDescriptor",
  "official_doc_section": null,
  "error_code": "CUDNN_STATUS_BAD_PARAM",
  "verification_tier": "ai_generated",
  "confidence": 0.88,
  "fix_success_rate": 0.85,
  "resolvable": "true",
  "first_seen": "2023-06-20",
  "last_confirmed": "2024-06-01",
  "last_updated": "2025-02-15",
  "evidence_count": 1,
  "tags": [],
  "locale": "en",
  "aliases": []
}