data
data_error
ai_generated
true
CSV解析器静默地从带引号的字段中删除前导/尾随空白
CSV parser silently trims leading/trailing whitespace from quoted fields
ID: data/csv-whitespace-trimming
85%修复率
86%置信度
1证据数
2024-01-12首次发现
版本兼容性
| 版本 | 状态 | 引入 | 弃用 | 备注 |
|---|---|---|---|---|
| pandas 2.0.0 | active | — | — | — |
| Python csv module 3.11 | active | — | — | — |
| Apache Spark 3.4.0 | active | — | — | — |
根因分析
许多CSV解析器(例如pandas read_csv、Excel)默认从带引号的字段中删除空白,但有些不会,导致系统间数据不一致。
English
Many CSV parsers (e.g., pandas read_csv, Excel) trim whitespace from quoted fields by default, but some do not, causing data inconsistency between systems.
官方文档
https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html解决方案
-
Use pandas with skipinitialspace=False: df = pd.read_csv('file.csv', skipinitialspace=False) -
Wrap fields in quotes and use a parser that preserves whitespace: csv.reader(csvfile, skipinitialspace=False)
无效尝试
常见但无效的做法:
-
Setting quoting=csv.QUOTE_NONE in Python's csv module
85% 失败
This disables all quoting and may break fields containing commas.
-
Adding a post-processing step to re-add whitespace based on original file
70% 失败
Does not affect how the CSV is parsed, only how data is validated.