# CSV null vs empty string ambiguity — "" and no-value both become None in pandas

- **ID:** `data/csv-null-vs-empty-string-ambiguity`
- **Domain:** data
- **Category:** data_error
- **Verification:** ai_generated
- **Fix Rate:** 82%

## Root Cause

Pandas read_csv treats both empty quoted strings and missing fields as NaN by default, losing the distinction between empty strings and null values.

## Version Compatibility

| Version | Status | Introduced | Deprecated |
|---------|--------|------------|------------|
| pandas 1.5.3 | active | — | — |
| pandas 2.0.0 | active | — | — |
| pandas 2.1.4 | active | — | — |

## Workarounds

1. **Use pd.read_csv(..., keep_default_na=False, na_values=[''], dtype=str) and then manually convert empty strings to None where needed. Example: df = pd.read_csv('data.csv', keep_default_na=False, na_values=[''], dtype={'col1': str}); df['col1'] = df['col1'].replace('', pd.NA)** (85% success)
   ```
   Use pd.read_csv(..., keep_default_na=False, na_values=[''], dtype=str) and then manually convert empty strings to None where needed. Example: df = pd.read_csv('data.csv', keep_default_na=False, na_values=[''], dtype={'col1': str}); df['col1'] = df['col1'].replace('', pd.NA)
   ```
2. **Pre-process CSV by replacing empty quoted fields with a sentinel like '__NULL__', then map back after reading: sed 's/""/__NULL__/g' input.csv | pd.read_csv(...); df.replace('__NULL__', pd.NA)** (78% success)
   ```
   Pre-process CSV by replacing empty quoted fields with a sentinel like '__NULL__', then map back after reading: sed 's/""/__NULL__/g' input.csv | pd.read_csv(...); df.replace('__NULL__', pd.NA)
   ```

## Dead Ends

- **** — This makes pandas treat no-value cells as empty strings too, but still converts empty quoted strings to NaN. (65% fail)
- **** — Disables all NA detection, but also prevents legitimate NaN values from being recognized, breaking downstream null handling. (70% fail)
