# CSV 空值与空字符串歧义——"" 和无值在 pandas 中均变为 None

- **ID:** `data/csv-null-vs-empty-string-ambiguity`
- **领域:** data
- **类别:** data_error
- **验证级别:** ai_generated
- **修复率:** 82%

## 根因

Pandas read_csv 默认将空引号字符串和缺失字段均视为 NaN，丢失了空字符串与空值之间的区别。

## 版本兼容性

| 版本 | 状态 | 引入 | 弃用 |
|------|------|------|------|
| pandas 1.5.3 | active | — | — |
| pandas 2.0.0 | active | — | — |
| pandas 2.1.4 | active | — | — |

## 解决方案

1. ```
   Use pd.read_csv(..., keep_default_na=False, na_values=[''], dtype=str) and then manually convert empty strings to None where needed. Example: df = pd.read_csv('data.csv', keep_default_na=False, na_values=[''], dtype={'col1': str}); df['col1'] = df['col1'].replace('', pd.NA)
   ```
2. ```
   Pre-process CSV by replacing empty quoted fields with a sentinel like '__NULL__', then map back after reading: sed 's/""/__NULL__/g' input.csv | pd.read_csv(...); df.replace('__NULL__', pd.NA)
   ```

## 无效尝试

- **** — This makes pandas treat no-value cells as empty strings too, but still converts empty quoted strings to NaN. (65% 失败率)
- **** — Disables all NA detection, but also prevents legitimate NaN values from being recognized, breaking downstream null handling. (70% 失败率)
