ArrowNotImplementedError
data
type_error
ai_generated
true
Parquet decimal precision overflow when reading into pandas
ID: data/parquet-decimal-overflow
85%Fix Rate
83%Confidence
1Evidence
2024-01-10First Seen
Version Compatibility
| Version | Status | Introduced | Deprecated | Notes |
|---|---|---|---|---|
| pyarrow 12.0.0 | active | — | — | — |
| pyarrow 14.0.1 | active | — | — | — |
| pandas 2.2.0 | active | — | — | — |
Root Cause
Parquet files store decimals with arbitrary precision (e.g., decimal(38,10)), but pandas converts them to float64 by default, causing overflow or precision loss for values exceeding float64 capacity.
generic中文
Parquet文件以任意精度存储十进制数(例如decimal(38,10)),但pandas默认将其转换为float64,导致超过float64容量的值溢出或精度丢失。
Official Documentation
https://arrow.apache.org/docs/python/generated/pyarrow.parquet.read_table.htmlWorkarounds
-
90% success Read with pyarrow and specify decimal type: `import pyarrow.parquet as pq; table = pq.read_table('data.parquet'); from decimal import Decimal; df = table.to_pandas(types_mapper={pa.decimal128(38,10): Decimal})`
Read with pyarrow and specify decimal type: `import pyarrow.parquet as pq; table = pq.read_table('data.parquet'); from decimal import Decimal; df = table.to_pandas(types_mapper={pa.decimal128(38,10): Decimal})` -
82% success Use pandas read_parquet with dtype_backend='pyarrow': `df = pd.read_parquet('data.parquet', dtype_backend='pyarrow')`
Use pandas read_parquet with dtype_backend='pyarrow': `df = pd.read_parquet('data.parquet', dtype_backend='pyarrow')`
中文步骤
Read with pyarrow and specify decimal type: `import pyarrow.parquet as pq; table = pq.read_table('data.parquet'); from decimal import Decimal; df = table.to_pandas(types_mapper={pa.decimal128(38,10): Decimal})`Use pandas read_parquet with dtype_backend='pyarrow': `df = pd.read_parquet('data.parquet', dtype_backend='pyarrow')`
Dead Ends
Common approaches that don't work:
-
70% fail
This only preserves pandas-specific metadata like index names; it does not change the decimal-to-float conversion behavior.
-
85% fail
The overflow already occurred during reading; the string representation will show the truncated/rounded value.
-
75% fail
Fastparquet has the same limitation; it also converts decimals to float64 by default.