# Parquet INT96 timestamp reads as year 5000+ due to Julian date conversion error

- **ID:** `data/parquet-int96-timestamp-millennium-bug`
- **Domain:** data
- **Category:** data_error
- **Verification:** ai_generated
- **Fix Rate:** 80%

## Root Cause

Parquet INT96 timestamps store a Julian day number (days since 4713 BC) and time of day; some readers (e.g., older Hive, Impala) incorrectly interpret the Julian date as a Unix epoch offset, causing dates to be centuries off.

## Version Compatibility

| Version | Status | Introduced | Deprecated |
|---------|--------|------------|------------|
| Apache Parquet 1.12.0 | active | — | — |
| Apache Hive 3.1.3 | active | — | — |
| Apache Impala 4.0.0 | active | — | — |
| pyarrow 13.0.0 | active | — | — |

## Workarounds

1. **In pyarrow, read with `pq.read_table(path, use_legacy_int96_timestamps=False)` to use the corrected conversion. Example: `import pyarrow.parquet as pq; table = pq.read_table('data.parquet', use_legacy_int96_timestamps=False)`** (90% success)
   ```
   In pyarrow, read with `pq.read_table(path, use_legacy_int96_timestamps=False)` to use the corrected conversion. Example: `import pyarrow.parquet as pq; table = pq.read_table('data.parquet', use_legacy_int96_timestamps=False)`
   ```
2. **In Spark, set `spark.sql.parquet.int96TimestampConversion.enabled` to `false` and `spark.sql.parquet.int96RebaseModeInRead` to `CORRECTED` to fix the conversion** (85% success)
   ```
   In Spark, set `spark.sql.parquet.int96TimestampConversion.enabled` to `false` and `spark.sql.parquet.int96RebaseModeInRead` to `CORRECTED` to fix the conversion
   ```
3. **Rewrite the Parquet file using a modern writer (e.g., Spark 3.x) that stores timestamps as INT64 millis instead of INT96, then read with the new file** (95% success)
   ```
   Rewrite the Parquet file using a modern writer (e.g., Spark 3.x) that stores timestamps as INT64 millis instead of INT96, then read with the new file
   ```

## Dead Ends

- **** — The CAST operation uses the same broken conversion logic; it will produce the same erroneous future dates. (95% fail)
- **** — Converting INT96 to STRING often results in a binary representation (e.g., '\x00...') that is not human-readable and cannot be parsed into a date. (80% fail)
- **** — The bug is in the INT96 conversion logic, which may still be present in newer versions if the file was written by a different tool (e.g., Spark) that uses a non-standard INT96 encoding. (60% fail)
