data encoding_error ai_generated true

由于Latin-1与UTF-8编码不匹配,CSV文件在Excel中打开时特殊字符静默损坏

CSV file silently corrupts special characters when opened in Excel due to Latin-1 vs UTF-8 encoding mismatch

ID: data/csv-encoding-mismatch-latin1

其他格式: JSON · Markdown 中文 · English
85%修复率
88%置信度
1证据数
2023-03-10首次发现

版本兼容性

版本状态引入弃用备注
Microsoft Excel 2021 active
Microsoft Excel 365 active
LibreOffice Calc 7.5 active

根因分析

Excel默认假定CSV文件使用Latin-1(Windows-1252)编码,而现代数据工具以UTF-8导出,导致ü、ñ或€等字符显示为乱码。

English

Excel assumes CSV files are encoded in Latin-1 (Windows-1252) by default, while modern data tools export in UTF-8, causing characters like ü, ñ, or € to display as garbled text.

generic

官方文档

https://support.microsoft.com/en-us/office/import-or-export-text-txt-or-csv-files-5250ac4c-663c-47ce-937b-d0b5f933c3a9

解决方案

  1. Add UTF-8 BOM to the CSV file: echo -e '\xEF\xBB\xBF' > output.csv; cat original.csv >> output.csv; then open in Excel
  2. Use Python to convert CSV to Latin-1 encoding: with open('input.csv', 'r', encoding='utf-8') as f, open('output.csv', 'w', encoding='latin-1') as out: out.write(f.read())

无效尝试

常见但无效的做法:

  1. Adding UTF-8 BOM to the file without verifying Excel version compatibility 60% 失败

    UTF-8 BOM may cause Excel to detect encoding correctly but adds invisible characters to first column header.

  2. Converting file to UTF-16 which Excel supports but causes other issues 75% 失败

    This changes the data format and may break downstream systems expecting UTF-8.