# 读取Parquet到pandas时十进制精度溢出

- **ID:** `data/parquet-decimal-overflow`
- **领域:** data
- **类别:** type_error
- **错误码:** `ArrowNotImplementedError`
- **验证级别:** ai_generated
- **修复率:** 85%

## 根因

Parquet文件以任意精度存储十进制数（例如decimal(38,10)），但pandas默认将其转换为float64，导致超过float64容量的值溢出或精度丢失。

## 版本兼容性

| 版本 | 状态 | 引入 | 弃用 |
|------|------|------|------|
| pyarrow 12.0.0 | active | — | — |
| pyarrow 14.0.1 | active | — | — |
| pandas 2.2.0 | active | — | — |

## 解决方案

1. ```
   Read with pyarrow and specify decimal type: `import pyarrow.parquet as pq; table = pq.read_table('data.parquet'); from decimal import Decimal; df = table.to_pandas(types_mapper={pa.decimal128(38,10): Decimal})`
   ```
2. ```
   Use pandas read_parquet with dtype_backend='pyarrow': `df = pd.read_parquet('data.parquet', dtype_backend='pyarrow')`
   ```

## 无效尝试

- **** — This only preserves pandas-specific metadata like index names; it does not change the decimal-to-float conversion behavior. (70% 失败率)
- **** — The overflow already occurred during reading; the string representation will show the truncated/rounded value. (85% 失败率)
- **** — Fastparquet has the same limitation; it also converts decimals to float64 by default. (75% 失败率)
