# 当文件以 UTF-8 BOM 开头时，CSV 解析器无法识别带引号的字段

- **ID:** `data/csv-utf8-bom-quote-mismatch`
- **领域:** data
- **类别:** encoding_error
- **错误码:** `csv.Error: field larger than field limit (131072)`
- **验证级别:** ai_generated
- **修复率:** 90%

## 根因

许多 CSV 解析器不会移除文件开头的 UTF-8 BOM（0xEF BB BF），导致第一个字段包含 BOM，如果该字段带引号则会破坏引号检测。

## 版本兼容性

| 版本 | 状态 | 引入 | 弃用 |
|------|------|------|------|
| Python 3.11 csv module | active | — | — |
| Apache Commons CSV 1.10.0 | active | — | — |
| Pandas 2.1.0 | active | — | — |

## 解决方案

1. ```
   Strip BOM before CSV parsing: with open('file.csv', 'r', encoding='utf-8-sig') as f: content = f.read().lstrip('\ufeff'); reader = csv.reader(StringIO(content))
   ```
2. ```
   Use pandas with encoding='utf-8-sig': pd.read_csv('file.csv', encoding='utf-8-sig')
   ```

## 无效尝试

- **Opening file in 'utf-8-sig' encoding in Python** — While this removes BOM for text reading, the csv.reader still processes the BOM as part of the first field if not handled before parsing. (50% 失败率)
- **Manually skipping first byte with file.seek(3)** — This only works if BOM is exactly 3 bytes; some editors may add extra bytes, and it breaks for files without BOM. (60% 失败率)
