data
data_error
ai_generated
true
CSV null vs empty string ambiguity — "" and no-value both become None in pandas
ID: data/csv-null-vs-empty-string-ambiguity
82%Fix Rate
85%Confidence
1Evidence
2023-03-15First Seen
Version Compatibility
| Version | Status | Introduced | Deprecated | Notes |
|---|---|---|---|---|
| pandas 1.5.3 | active | — | — | — |
| pandas 2.0.0 | active | — | — | — |
| pandas 2.1.4 | active | — | — | — |
Root Cause
Pandas read_csv treats both empty quoted strings and missing fields as NaN by default, losing the distinction between empty strings and null values.
generic中文
Pandas read_csv 默认将空引号字符串和缺失字段均视为 NaN,丢失了空字符串与空值之间的区别。
Official Documentation
https://pandas.pydata.org/docs/user_guide/io.html#io-read-csv-tableWorkarounds
-
85% success Use pd.read_csv(..., keep_default_na=False, na_values=[''], dtype=str) and then manually convert empty strings to None where needed. Example: df = pd.read_csv('data.csv', keep_default_na=False, na_values=[''], dtype={'col1': str}); df['col1'] = df['col1'].replace('', pd.NA)
Use pd.read_csv(..., keep_default_na=False, na_values=[''], dtype=str) and then manually convert empty strings to None where needed. Example: df = pd.read_csv('data.csv', keep_default_na=False, na_values=[''], dtype={'col1': str}); df['col1'] = df['col1'].replace('', pd.NA) -
78% success Pre-process CSV by replacing empty quoted fields with a sentinel like '__NULL__', then map back after reading: sed 's/""/__NULL__/g' input.csv | pd.read_csv(...); df.replace('__NULL__', pd.NA)
Pre-process CSV by replacing empty quoted fields with a sentinel like '__NULL__', then map back after reading: sed 's/""/__NULL__/g' input.csv | pd.read_csv(...); df.replace('__NULL__', pd.NA)
中文步骤
Use pd.read_csv(..., keep_default_na=False, na_values=[''], dtype=str) and then manually convert empty strings to None where needed. Example: df = pd.read_csv('data.csv', keep_default_na=False, na_values=[''], dtype={'col1': str}); df['col1'] = df['col1'].replace('', pd.NA)Pre-process CSV by replacing empty quoted fields with a sentinel like '__NULL__', then map back after reading: sed 's/""/__NULL__/g' input.csv | pd.read_csv(...); df.replace('__NULL__', pd.NA)
Dead Ends
Common approaches that don't work:
-
65% fail
This makes pandas treat no-value cells as empty strings too, but still converts empty quoted strings to NaN.
-
70% fail
Disables all NA detection, but also prevents legitimate NaN values from being recognized, breaking downstream null handling.