ncclSystemError
cuda
config_error
ai_generated
true
RuntimeError: NCCL error: version mismatch, expected 2.18.5 but got 2.19.1
ID: cuda/nccl-version-mismatch
90%Fix Rate
87%Confidence
1Evidence
2024-03-20First Seen
Version Compatibility
| Version | Status | Introduced | Deprecated | Notes |
|---|---|---|---|---|
| NCCL 2.18.5 | active | — | — | — |
| NCCL 2.19.1 | active | — | — | — |
| PyTorch 2.1.0 | active | — | — | — |
Root Cause
The NCCL library version used at runtime differs from the one expected by PyTorch, often due to multiple NCCL installations or incorrect LD_LIBRARY_PATH.
generic中文
运行时使用的NCCL库版本与PyTorch期望的版本不同,通常是由于多个NCCL安装或LD_LIBRARY_PATH错误。
Official Documentation
https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/index.htmlWorkarounds
-
90% success Set the environment variable `LD_LIBRARY_PATH` to point to the correct NCCL installation. For example: `export LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu/nccl:$LD_LIBRARY_PATH`. Alternatively, use `conda install -c conda-forge nccl` to ensure consistency.
Set the environment variable `LD_LIBRARY_PATH` to point to the correct NCCL installation. For example: `export LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu/nccl:$LD_LIBRARY_PATH`. Alternatively, use `conda install -c conda-forge nccl` to ensure consistency.
中文步骤
Set the environment variable `LD_LIBRARY_PATH` to point to the correct NCCL installation. For example: `export LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu/nccl:$LD_LIBRARY_PATH`. Alternatively, use `conda install -c conda-forge nccl` to ensure consistency.
Dead Ends
Common approaches that don't work:
-
95% fail
Debugging does not resolve binary incompatibility.
-
70% fail
PyTorch bundles its own NCCL, but system paths can override it.