data data_error ai_generated true

CSV解析器静默地从带引号的字段中删除前导/尾随空白

CSV parser silently trims leading/trailing whitespace from quoted fields

ID: data/csv-whitespace-trimming

其他格式: JSON · Markdown 中文 · English
85%修复率
86%置信度
1证据数
2024-01-12首次发现

版本兼容性

版本状态引入弃用备注
pandas 2.0.0 active
Python csv module 3.11 active
Apache Spark 3.4.0 active

根因分析

许多CSV解析器(例如pandas read_csv、Excel)默认从带引号的字段中删除空白,但有些不会,导致系统间数据不一致。

English

Many CSV parsers (e.g., pandas read_csv, Excel) trim whitespace from quoted fields by default, but some do not, causing data inconsistency between systems.

generic

官方文档

https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html

解决方案

  1. Use pandas with skipinitialspace=False: df = pd.read_csv('file.csv', skipinitialspace=False)
  2. Wrap fields in quotes and use a parser that preserves whitespace: csv.reader(csvfile, skipinitialspace=False)

无效尝试

常见但无效的做法:

  1. Setting quoting=csv.QUOTE_NONE in Python's csv module 85% 失败

    This disables all quoting and may break fields containing commas.

  2. Adding a post-processing step to re-add whitespace based on original file 70% 失败

    Does not affect how the CSV is parsed, only how data is validated.