# CSV parser fails to recognize quoted fields when file starts with UTF-8 BOM

- **ID:** `data/csv-utf8-bom-quote-mismatch`
- **Domain:** data
- **Category:** encoding_error
- **Error Code:** `csv.Error: field larger than field limit (131072)`
- **Verification:** ai_generated
- **Fix Rate:** 90%

## Root Cause

UTF-8 BOM (0xEF BB BF) at file start is not stripped by many CSV parsers, causing the first field to include the BOM and breaking quote detection if the field is quoted.

## Version Compatibility

| Version | Status | Introduced | Deprecated |
|---------|--------|------------|------------|
| Python 3.11 csv module | active | — | — |
| Apache Commons CSV 1.10.0 | active | — | — |
| Pandas 2.1.0 | active | — | — |

## Workarounds

1. **Strip BOM before CSV parsing: with open('file.csv', 'r', encoding='utf-8-sig') as f: content = f.read().lstrip('\ufeff'); reader = csv.reader(StringIO(content))** (95% success)
   ```
   Strip BOM before CSV parsing: with open('file.csv', 'r', encoding='utf-8-sig') as f: content = f.read().lstrip('\ufeff'); reader = csv.reader(StringIO(content))
   ```
2. **Use pandas with encoding='utf-8-sig': pd.read_csv('file.csv', encoding='utf-8-sig')** (90% success)
   ```
   Use pandas with encoding='utf-8-sig': pd.read_csv('file.csv', encoding='utf-8-sig')
   ```

## Dead Ends

- **Opening file in 'utf-8-sig' encoding in Python** — While this removes BOM for text reading, the csv.reader still processes the BOM as part of the first field if not handled before parsing. (50% fail)
- **Manually skipping first byte with file.seek(3)** — This only works if BOM is exactly 3 bytes; some editors may add extra bytes, and it breaks for files without BOM. (60% fail)
