# Parquet UINT64 column overflows when cast to signed INT64 in Spark or Arrow

- **ID:** `data/parquet-uint64-overflow-cast`
- **Domain:** data
- **Category:** type_error
- **Error Code:** `org.apache.spark.sql.AnalysisException: Overflow in sum of UINT64`
- **Verification:** ai_generated
- **Fix Rate:** 75%

## Root Cause

Parquet format supports UINT64 logical type, but many engines (Spark, Arrow) lack native UINT64 support and silently cast to INT64, causing overflow for values > 2^63-1.

## Version Compatibility

| Version | Status | Introduced | Deprecated |
|---------|--------|------------|------------|
| Apache Parquet 2.8.0+ | active | — | — |
| Apache Spark 3.4.0 | active | — | — |
| Apache Arrow 12.0.0 | active | — | — |

## Workarounds

1. **Read UINT64 as String type in Spark: spark.read.parquet(path).withColumn("col", col("col").cast("string"))** (85% success)
   ```
   Read UINT64 as String type in Spark: spark.read.parquet(path).withColumn("col", col("col").cast("string"))
   ```
2. **Use PyArrow with safe_cast=False to preserve UINT64 as binary: pq.read_table(path, safe_cast=False).then convert to Python int via struct.unpack.** (90% success)
   ```
   Use PyArrow with safe_cast=False to preserve UINT64 as binary: pq.read_table(path, safe_cast=False).then convert to Python int via struct.unpack.
   ```
3. **Pre-process data to ensure UINT64 values fit within INT64 range before writing Parquet.** (70% success)
   ```
   Pre-process data to ensure UINT64 values fit within INT64 range before writing Parquet.
   ```

## Dead Ends

- **Casting to Decimal(38,0) to hold larger values** — Decimal(38,0) can hold up to 10^38-1, but Spark's decimal precision is limited and arithmetic may still overflow or lose precision when converting back. (60% fail)
- **Using Double type to avoid overflow** — Double cannot represent all integers exactly beyond 2^53, causing silent precision loss for large UINT64 values. (80% fail)
- **Disabling Parquet type promotion entirely** — This may cause schema compatibility errors for other columns and does not address the root issue of UINT64 handling. (50% fail)
