data
data_error
ai_generated
true
CSV parser silently trims leading/trailing whitespace from quoted fields
ID: data/csv-whitespace-trimming
85%Fix Rate
86%Confidence
1Evidence
2024-01-12First Seen
Version Compatibility
| Version | Status | Introduced | Deprecated | Notes |
|---|---|---|---|---|
| pandas 2.0.0 | active | — | — | — |
| Python csv module 3.11 | active | — | — | — |
| Apache Spark 3.4.0 | active | — | — | — |
Root Cause
Many CSV parsers (e.g., pandas read_csv, Excel) trim whitespace from quoted fields by default, but some do not, causing data inconsistency between systems.
generic中文
许多CSV解析器(例如pandas read_csv、Excel)默认从带引号的字段中删除空白,但有些不会,导致系统间数据不一致。
Official Documentation
https://pandas.pydata.org/docs/reference/api/pandas.read_csv.htmlWorkarounds
-
95% success Use pandas with skipinitialspace=False: df = pd.read_csv('file.csv', skipinitialspace=False)
Use pandas with skipinitialspace=False: df = pd.read_csv('file.csv', skipinitialspace=False) -
90% success Wrap fields in quotes and use a parser that preserves whitespace: csv.reader(csvfile, skipinitialspace=False)
Wrap fields in quotes and use a parser that preserves whitespace: csv.reader(csvfile, skipinitialspace=False)
中文步骤
Use pandas with skipinitialspace=False: df = pd.read_csv('file.csv', skipinitialspace=False)Wrap fields in quotes and use a parser that preserves whitespace: csv.reader(csvfile, skipinitialspace=False)
Dead Ends
Common approaches that don't work:
-
Setting quoting=csv.QUOTE_NONE in Python's csv module
85% fail
This disables all quoting and may break fields containing commas.
-
Adding a post-processing step to re-add whitespace based on original file
70% fail
Does not affect how the CSV is parsed, only how data is validated.