data
data_error
ai_generated
true
使用pandas read_csv读写CSV时浮点数精度丢失
CSV float precision loss when reading/writing with pandas read_csv
ID: data/csv-float-precision-loss
88%修复率
85%置信度
1证据数
2023-06-15首次发现
版本兼容性
| 版本 | 状态 | 引入 | 弃用 | 备注 |
|---|---|---|---|---|
| pandas 1.5.3 | active | — | — | — |
| pandas 2.0.0 | active | — | — | — |
| pandas 2.1.0 | active | — | — | — |
根因分析
pandas read_csv默认使用float64,会截断超过15-17位有效数字的浮点数,导致科学测量或金融交易等高精度数据静默丢失精度。
English
pandas read_csv by default uses float64 which truncates float values beyond 15-17 significant digits, causing silent precision loss for high-precision data like scientific measurements or financial transactions.
官方文档
https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html解决方案
-
Use pandas.read_csv with dtype=Decimal for critical columns: `import decimal; df = pd.read_csv('data.csv', dtype={'amount': decimal.Decimal})` -
Read CSV as string and convert to Decimal after: `df = pd.read_csv('data.csv', dtype=str); from decimal import Decimal; df['amount'] = df['amount'].apply(Decimal)` -
Use numpy.float128 if available: `df = pd.read_csv('data.csv', dtype={'amount': np.float128})`
无效尝试
常见但无效的做法:
-
60% 失败
This disables all numeric processing and may break downstream operations expecting float types; also increases memory usage significantly.
-
90% 失败
Rounding cannot recover lost precision; the original value is already truncated during CSV parsing.
-
80% 失败
float64 is the default and still truncates; need higher precision type like float128 or decimal.