# Parquet INT96 timestamp reads with incorrect timezone offset when written by Hive

- **ID:** `data/parquet-int96-timestamp-timezone`
- **Domain:** data
- **Category:** data_error
- **Error Code:** `No explicit error; timestamp values off by timezone offset`
- **Verification:** ai_generated
- **Fix Rate:** 80%

## Root Cause

Hive writes INT96 timestamps in UTC but many readers (e.g., older Spark, Impala) assume local timezone, causing off-by-hour errors.

## Version Compatibility

| Version | Status | Introduced | Deprecated |
|---------|--------|------------|------------|
| Apache Hive 3.1.3 | active | — | — |
| Apache Spark 3.2.0 | active | — | — |
| Apache Impala 4.0.0 | active | — | — |

## Workarounds

1. **Use Spark with config: spark.sql.parquet.int96TimestampConversion=true and spark.sql.session.timeZone=UTC to force correct conversion.** (90% success)
   ```
   Use Spark with config: spark.sql.parquet.int96TimestampConversion=true and spark.sql.session.timeZone=UTC to force correct conversion.
   ```
2. **Rewrite Parquet files using a tool like Parquet-MR with Int96WriteSupport to explicitly store timestamps in UTC.** (85% success)
   ```
   Rewrite Parquet files using a tool like Parquet-MR with Int96WriteSupport to explicitly store timestamps in UTC.
   ```

## Dead Ends

- **Setting Spark session timezone to UTC** — While this aligns the reader, it does not fix the underlying assumption that INT96 is in local time; the offset is still applied incorrectly. (60% fail)
- **Converting timestamps using date_add/date_sub with fixed offset** — The offset may vary by timezone and daylight saving, making a fixed offset incorrect for many cases. (70% fail)
