data encoding_error ai_generated true

带 UTF-8 BOM 的 CSV 文件在 Windows 上的 Excel 中导致静默数据损坏

CSV file with UTF-8 BOM causes silent data corruption in Excel on Windows

ID: data/csv-encoding-utf8-with-bom-silent-corruption

其他格式: JSON · Markdown 中文 · English
92%修复率
90%置信度
1证据数
2023-05-18首次发现

版本兼容性

版本状态引入弃用备注
Excel 2019 active
Excel 365 active
Excel 2021 active

根因分析

Windows 上的 Excel 将无 BOM 的 UTF-8 CSV 文件解释为 ANSI(Windows-1252),损坏非 ASCII 字符。添加 BOM 可修复编码检测,但可能导致其他不期望 BOM 的工具出现问题。

English

Excel on Windows interprets BOM-less UTF-8 CSV files as ANSI (Windows-1252), corrupting non-ASCII characters. Adding BOM fixes encoding detection but may cause issues with other tools that don't expect BOM.

generic

官方文档

https://support.microsoft.com/en-us/office/import-or-export-text-txt-or-csv-files-5250ac4c-663c-47ce-937b-339e391393ba

解决方案

  1. Add UTF-8 BOM to CSV files before opening in Excel. In Python: with open('output.csv', 'w', encoding='utf-8-sig') as f: writer = csv.writer(f); writer.writerows(data). The 'utf-8-sig' encoding adds BOM automatically. In command line: sed '1s/^/\xef\xbb\xbf/' input.csv > output.csv
  2. Use Excel's 'Get Data from Text/CSV' feature instead of double-clicking: Data tab > Get Data > From File > From Text/CSV. Then choose UTF-8 encoding explicitly in the import wizard.

无效尝试

常见但无效的做法:

  1. 55% 失败

    This option adds BOM but also changes the file format slightly (e.g., quoting rules), and the file may not be re-importable correctly.

  2. 80% 失败

    UTF-16 is not widely supported by CSV parsers and will cause issues with most data processing tools. It also doubles file size.