{
  "id": "cuda/cublas-gemm-broadcast-dimension-mismatch",
  "signature": "RuntimeError: CUBLAS_STATUS_INVALID_VALUE when calling cublasSgemmStridedBatched( handle, opa, opb, m, n, k, &alpha, a, lda, stridea, b, ldb, strideb, &beta, c, ldc, stridec, batchCount )",
  "signature_zh": "运行时错误：调用 cublasSgemmStridedBatched 时出现 CUBLAS_STATUS_INVALID_VALUE",
  "regex": "CUBLAS_STATUS_INVALID_VALUE when calling cublasS?[DH]?gemm(StridedBatched)?",
  "domain": "cuda",
  "category": "runtime_error",
  "subcategory": null,
  "root_cause": "The GEMM dimensions (m, n, k) derived from tensor shapes are incompatible or non-positive, often due to a batch broadcast operation that produces a dimension of zero or a leading dimension (lda/ldb/ldc) violation.",
  "root_cause_type": "generic",
  "root_cause_zh": "从张量形状推导出的 GEMM 维度（m, n, k）不兼容或非正数，通常是由于批次广播操作产生了零维度或前导维度（lda/ldb/ldc）冲突。",
  "versions": [
    {
      "version": "CUDA 11.8",
      "introduced": null,
      "deprecated": null,
      "removed": null,
      "behavior_change": null,
      "status": "active"
    },
    {
      "version": "CUDA 12.1",
      "introduced": null,
      "deprecated": null,
      "removed": null,
      "behavior_change": null,
      "status": "active"
    },
    {
      "version": "cuBLAS 11.11",
      "introduced": null,
      "deprecated": null,
      "removed": null,
      "behavior_change": null,
      "status": "active"
    },
    {
      "version": "PyTorch 2.0.1",
      "introduced": null,
      "deprecated": null,
      "removed": null,
      "behavior_change": null,
      "status": "active"
    }
  ],
  "os_specific": {},
  "dead_ends": [
    {
      "action": "Restarting the kernel or clearing CUDA cache",
      "why_fails": "The error is a dimension validation failure, not a memory or state issue; restarting does not fix the invalid tensor shapes.",
      "fail_rate": 0.95,
      "condition": "",
      "sources": []
    },
    {
      "action": "Increasing batch size to avoid zero-sized batches",
      "why_fails": "The error is not about batch size being zero per se, but about a mismatch in m/n/k derived from batched tensor broadcasting; arbitrary batch size changes can mask the real shape bug.",
      "fail_rate": 0.8,
      "condition": "",
      "sources": []
    },
    {
      "action": "Downgrading cuBLAS to an older version",
      "why_fails": "The dimension validation is consistent across cuBLAS versions; older versions may have the same check or even stricter checks.",
      "fail_rate": 0.9,
      "condition": "",
      "sources": []
    }
  ],
  "workarounds": [
    {
      "action": "Print and verify the shapes of all tensors passed to the GEMM operation before calling the matmul. Ensure that the last dimensions of the two input matrices are compatible (e.g., for `torch.matmul(A, B)`, A.shape[-1] == B.shape[-2]) and that no dimension is zero. Example: `print(A.shape, B.shape); assert A.shape[-1] == B.shape[-2] and all(d > 0 for d in A.shape + B.shape)`.",
      "success_rate": 0.85,
      "how": "Print and verify the shapes of all tensors passed to the GEMM operation before calling the matmul. Ensure that the last dimensions of the two input matrices are compatible (e.g., for `torch.matmul(A, B)`, A.shape[-1] == B.shape[-2]) and that no dimension is zero. Example: `print(A.shape, B.shape); assert A.shape[-1] == B.shape[-2] and all(d > 0 for d in A.shape + B.shape)`.",
      "condition": "",
      "sources": []
    },
    {
      "action": "If using batched operations with broadcasting, explicitly expand the smaller tensor to match the batch dimensions using `torch.broadcast_to` or `unsqueeze` + `expand` before the matmul, ensuring all batch dimensions are consistent.",
      "success_rate": 0.75,
      "how": "If using batched operations with broadcasting, explicitly expand the smaller tensor to match the batch dimensions using `torch.broadcast_to` or `unsqueeze` + `expand` before the matmul, ensuring all batch dimensions are consistent.",
      "condition": "",
      "sources": []
    },
    {
      "action": "Set environment variable `CUBLAS_LOGINFO=1` to enable cuBLAS logging and capture the exact GEMM parameters (m, n, k, lda, etc.) being passed; cross-check these against the tensor shapes.",
      "success_rate": 0.7,
      "how": "Set environment variable `CUBLAS_LOGINFO=1` to enable cuBLAS logging and capture the exact GEMM parameters (m, n, k, lda, etc.) being passed; cross-check these against the tensor shapes.",
      "condition": "",
      "sources": []
    }
  ],
  "workarounds_zh": [
    "在调用矩阵乘法前打印并验证所有传入 GEMM 操作的张量形状。确保两个输入矩阵的最后一维兼容（例如，对于 `torch.matmul(A, B)`，A.shape[-1] == B.shape[-2]），且没有任何维度为零。示例：`print(A.shape, B.shape); assert A.shape[-1] == B.shape[-2] and all(d > 0 for d in A.shape + B.shape)`。",
    "如果使用带广播的批处理操作，在矩阵乘法前显式使用 `torch.broadcast_to` 或 `unsqueeze` + `expand` 将较小的张量扩展到匹配的批次维度，确保所有批次维度一致。",
    "设置环境变量 `CUBLAS_LOGINFO=1` 启用 cuBLAS 日志记录，捕获传递的确切 GEMM 参数（m, n, k, lda 等）；与张量形状交叉检查。"
  ],
  "transition_graph": {
    "leads_to": [],
    "preceded_by": [],
    "frequently_confused_with": []
  },
  "official_doc_url": "https://docs.nvidia.com/cuda/cublas/index.html#cublas-t-status",
  "official_doc_section": null,
  "error_code": "CUBLAS_STATUS_INVALID_VALUE",
  "verification_tier": "ai_generated",
  "confidence": 0.85,
  "fix_success_rate": 0.8,
  "resolvable": "true",
  "first_seen": "2023-08-15",
  "last_confirmed": "2024-06-01",
  "last_updated": "2025-03-01",
  "evidence_count": 1,
  "tags": [],
  "locale": "en",
  "aliases": []
}