pytorch runtime_error ai_generated true

RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED

ID: pytorch/cudnn-deterministic-error

Also available as: JSON · Markdown · 中文
75%Fix Rate
83%Confidence
1Evidence
2023-04-02First Seen

Version Compatibility

VersionStatusIntroducedDeprecatedNotes
torch>=1.10.0 active
cuDNN>=8.0 active

Root Cause

cuDNN was not properly initialized, often because of an inconsistent CUDA context or because a cuDNN handle was used after the CUDA device was reset.

generic

中文

cuDNN 未正确初始化,通常是由于 CUDA 上下文不一致,或者在 CUDA 设备重置后使用了 cuDNN 句柄。

Official Documentation

https://docs.nvidia.com/deeplearning/cudnn/api/index.html

Workarounds

  1. 80% success Ensure a single CUDA context is used. Avoid creating multiple contexts by calling torch.cuda.init() once at the beginning.
    Ensure a single CUDA context is used. Avoid creating multiple contexts by calling torch.cuda.init() once at the beginning.
  2. 70% success Set torch.backends.cudnn.deterministic = True and torch.backends.cudnn.benchmark = False to avoid handle conflicts.
    Set torch.backends.cudnn.deterministic = True and torch.backends.cudnn.benchmark = False to avoid handle conflicts.

中文步骤

  1. Ensure a single CUDA context is used. Avoid creating multiple contexts by calling torch.cuda.init() once at the beginning.
  2. Set torch.backends.cudnn.deterministic = True and torch.backends.cudnn.benchmark = False to avoid handle conflicts.

Dead Ends

Common approaches that don't work:

  1. 90% fail

    Calling torch.cuda.empty_cache() does not reinitialize cuDNN and may cause further issues.

  2. 85% fail

    Reinstalling PyTorch or cuDNN without addressing the CUDA context issue will not fix the problem.