CUSOLVER_STATUS_INTERNAL_ERROR
cuda
runtime_error
ai_generated
true
RuntimeError: cusolver error: CUSOLVER_STATUS_INTERNAL_ERROR when computing SVD of a singular matrix
ID: cuda/cusolver-internal-error-on-svd
76%Fix Rate
84%Confidence
1Evidence
2025-03-12First Seen
Version Compatibility
| Version | Status | Introduced | Deprecated | Notes |
|---|---|---|---|---|
| CUDA 12.4 | active | — | — | — |
| cuSolver 11.5.1 | active | — | — | — |
| PyTorch 2.3.0 | active | — | — | — |
Root Cause
cuSolver's SVD routine (gesvdj or gesvd) fails internally when the input matrix is exactly singular or has NaN/inf values, causing a buffer overflow or division by zero in the iterative solver.
generic中文
当输入矩阵恰好是奇异矩阵或包含NaN/inf值时,cuSolver的SVD例程(gesvdj或gesvd)内部失败,导致迭代求解器中的缓冲区溢出或除零错误。
Official Documentation
https://docs.nvidia.com/cuda/cusolver/index.htmlWorkarounds
-
85% success Preprocess the matrix to remove exact singularities: add a small regularization term (e.g., A += 1e-8 * torch.eye(n, device=A.device)) before calling torch.linalg.svd. Example: A_reg = A + 1e-8 * torch.eye(A.size(0), device=A.device); U, S, V = torch.linalg.svd(A_reg).
Preprocess the matrix to remove exact singularities: add a small regularization term (e.g., A += 1e-8 * torch.eye(n, device=A.device)) before calling torch.linalg.svd. Example: A_reg = A + 1e-8 * torch.eye(A.size(0), device=A.device); U, S, V = torch.linalg.svd(A_reg).
-
78% success Use torch.linalg.lstsq instead of SVD for solving least-squares problems, as it handles singular matrices more robustly.
Use torch.linalg.lstsq instead of SVD for solving least-squares problems, as it handles singular matrices more robustly.
中文步骤
Preprocess the matrix to remove exact singularities: add a small regularization term (e.g., A += 1e-8 * torch.eye(n, device=A.device)) before calling torch.linalg.svd. Example: A_reg = A + 1e-8 * torch.eye(A.size(0), device=A.device); U, S, V = torch.linalg.svd(A_reg).
Use torch.linalg.lstsq instead of SVD for solving least-squares problems, as it handles singular matrices more robustly.
Dead Ends
Common approaches that don't work:
-
60% fail
This works but defeats the purpose of GPU acceleration; also, the error may still occur on CPU if the matrix is singular.
-
85% fail
Singular matrices remain singular regardless of precision; the error is algorithmic, not numerical.