# CSV parsing error: UnicodeDecodeError with 'charmap' codec when reading ISO-8859-1 encoded file as UTF-8

- **ID:** `data/csv-encoding-iso-8859-1-vs-utf-8`
- **Domain:** data
- **Category:** encoding_error
- **Error Code:** `UnicodeDecodeError`
- **Verification:** ai_generated
- **Fix Rate:** 95%

## Root Cause

A CSV file encoded in ISO-8859-1 (Latin-1) contains byte sequences invalid in UTF-8 (e.g., accented characters like 'é' or 'ñ'), causing the default UTF-8 decoder to raise a UnicodeDecodeError.

## Version Compatibility

| Version | Status | Introduced | Deprecated |
|---------|--------|------------|------------|
| Python 3.10+ | active | — | — |
| pandas 2.0+ | active | — | — |
| Python csv module | active | — | — |

## Workarounds

1. **Detect and specify the correct encoding. Use `chardet` to auto-detect: `import chardet; with open('file.csv', 'rb') as f: result = chardet.detect(f.read(10000)); encoding = result['encoding']`. Then read with `pandas.read_csv('file.csv', encoding=encoding)`.** (90% success)
   ```
   Detect and specify the correct encoding. Use `chardet` to auto-detect: `import chardet; with open('file.csv', 'rb') as f: result = chardet.detect(f.read(10000)); encoding = result['encoding']`. Then read with `pandas.read_csv('file.csv', encoding=encoding)`.
   ```
2. **Convert the file to UTF-8 using `iconv` command: `iconv -f ISO-8859-1 -t UTF-8 original.csv > converted.csv`. Then read the converted file with default UTF-8 encoding.** (95% success)
   ```
   Convert the file to UTF-8 using `iconv` command: `iconv -f ISO-8859-1 -t UTF-8 original.csv > converted.csv`. Then read the converted file with default UTF-8 encoding.
   ```
3. **Read with `encoding='ISO-8859-1'` in pandas: `df = pd.read_csv('file.csv', encoding='ISO-8859-1')`.** (100% success)
   ```
   Read with `encoding='ISO-8859-1'` in pandas: `df = pd.read_csv('file.csv', encoding='ISO-8859-1')`.
   ```

## Dead Ends

- **** — Ignoring errors silently drops characters, leading to data corruption. For example, 'José' becomes 'Jos'. (90% fail)
- **** — Notepad++ may misinterpret the original encoding if auto-detect is wrong, or double-encode characters, producing mojibake. (50% fail)
- **** — Excel may add a BOM, change delimiter to semicolon based on locale, or truncate leading zeros in numeric fields. (70% fail)