data data_error ai_generated true

CSV float precision loss when reading/writing with pandas read_csv

ID: data/csv-float-precision-loss

Also available as: JSON · Markdown · 中文
88%Fix Rate
85%Confidence
1Evidence
2023-06-15First Seen

Version Compatibility

VersionStatusIntroducedDeprecatedNotes
pandas 1.5.3 active
pandas 2.0.0 active
pandas 2.1.0 active

Root Cause

pandas read_csv by default uses float64 which truncates float values beyond 15-17 significant digits, causing silent precision loss for high-precision data like scientific measurements or financial transactions.

generic

中文

pandas read_csv默认使用float64,会截断超过15-17位有效数字的浮点数,导致科学测量或金融交易等高精度数据静默丢失精度。

Official Documentation

https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html

Workarounds

  1. 90% success Use pandas.read_csv with dtype=Decimal for critical columns: `import decimal; df = pd.read_csv('data.csv', dtype={'amount': decimal.Decimal})`
    Use pandas.read_csv with dtype=Decimal for critical columns: `import decimal; df = pd.read_csv('data.csv', dtype={'amount': decimal.Decimal})`
  2. 88% success Read CSV as string and convert to Decimal after: `df = pd.read_csv('data.csv', dtype=str); from decimal import Decimal; df['amount'] = df['amount'].apply(Decimal)`
    Read CSV as string and convert to Decimal after: `df = pd.read_csv('data.csv', dtype=str); from decimal import Decimal; df['amount'] = df['amount'].apply(Decimal)`
  3. 75% success Use numpy.float128 if available: `df = pd.read_csv('data.csv', dtype={'amount': np.float128})`
    Use numpy.float128 if available: `df = pd.read_csv('data.csv', dtype={'amount': np.float128})`

中文步骤

  1. Use pandas.read_csv with dtype=Decimal for critical columns: `import decimal; df = pd.read_csv('data.csv', dtype={'amount': decimal.Decimal})`
  2. Read CSV as string and convert to Decimal after: `df = pd.read_csv('data.csv', dtype=str); from decimal import Decimal; df['amount'] = df['amount'].apply(Decimal)`
  3. Use numpy.float128 if available: `df = pd.read_csv('data.csv', dtype={'amount': np.float128})`

Dead Ends

Common approaches that don't work:

  1. 60% fail

    This disables all numeric processing and may break downstream operations expecting float types; also increases memory usage significantly.

  2. 90% fail

    Rounding cannot recover lost precision; the original value is already truncated during CSV parsing.

  3. 80% fail

    float64 is the default and still truncates; need higher precision type like float128 or decimal.