No explicit error; timestamp values off by timezone offset data data_error ai_generated partial

Parquet INT96 timestamp reads with incorrect timezone offset when written by Hive

ID: data/parquet-int96-timestamp-timezone

Also available as: JSON · Markdown · 中文

80%Fix Rate

86%Confidence

1Evidence

2023-09-05First Seen

Version Compatibility

Version	Status	Introduced	Deprecated	Notes
Apache Hive 3.1.3	active	—	—	—
Apache Spark 3.2.0	active	—	—	—
Apache Impala 4.0.0	active	—	—	—

Hive writes INT96 timestamps in UTC but many readers (e.g., older Spark, Impala) assume local timezone, causing off-by-hour errors.

generic

Hive 以 UTC 写入 INT96 时间戳，但许多读取器（如旧版 Spark、Impala）假设为本地时区，导致数小时的偏差。

90% success Use Spark with config: spark.sql.parquet.int96TimestampConversion=true and spark.sql.session.timeZone=UTC to force correct conversion.
```
Use Spark with config: spark.sql.parquet.int96TimestampConversion=true and spark.sql.session.timeZone=UTC to force correct conversion.
```
85% success Rewrite Parquet files using a tool like Parquet-MR with Int96WriteSupport to explicitly store timestamps in UTC.
```
Rewrite Parquet files using a tool like Parquet-MR with Int96WriteSupport to explicitly store timestamps in UTC.
```

Use Spark with config: spark.sql.parquet.int96TimestampConversion=true and spark.sql.session.timeZone=UTC to force correct conversion.

Rewrite Parquet files using a tool like Parquet-MR with Int96WriteSupport to explicitly store timestamps in UTC.

Common approaches that don't work:

Setting Spark session timezone to UTC 60% fail
While this aligns the reader, it does not fix the underlying assumption that INT96 is in local time; the offset is still applied incorrectly.
Converting timestamps using date_add/date_sub with fixed offset 70% fail
The offset may vary by timezone and daylight saving, making a fixed offset incorrect for many cases.