data encoding_error ai_generated true

CSV file silently corrupts special characters when opened in Excel due to Latin-1 vs UTF-8 encoding mismatch

ID: data/csv-encoding-mismatch-latin1

Also available as: JSON · Markdown · 中文
85%Fix Rate
88%Confidence
1Evidence
2023-03-10First Seen

Version Compatibility

VersionStatusIntroducedDeprecatedNotes
Microsoft Excel 2021 active
Microsoft Excel 365 active
LibreOffice Calc 7.5 active

Root Cause

Excel assumes CSV files are encoded in Latin-1 (Windows-1252) by default, while modern data tools export in UTF-8, causing characters like ü, ñ, or € to display as garbled text.

generic

中文

Excel默认假定CSV文件使用Latin-1(Windows-1252)编码,而现代数据工具以UTF-8导出,导致ü、ñ或€等字符显示为乱码。

Official Documentation

https://support.microsoft.com/en-us/office/import-or-export-text-txt-or-csv-files-5250ac4c-663c-47ce-937b-d0b5f933c3a9

Workarounds

  1. 90% success Add UTF-8 BOM to the CSV file: echo -e '\xEF\xBB\xBF' > output.csv; cat original.csv >> output.csv; then open in Excel
    Add UTF-8 BOM to the CSV file: echo -e '\xEF\xBB\xBF' > output.csv; cat original.csv >> output.csv; then open in Excel
  2. 85% success Use Python to convert CSV to Latin-1 encoding: with open('input.csv', 'r', encoding='utf-8') as f, open('output.csv', 'w', encoding='latin-1') as out: out.write(f.read())
    Use Python to convert CSV to Latin-1 encoding: with open('input.csv', 'r', encoding='utf-8') as f, open('output.csv', 'w', encoding='latin-1') as out: out.write(f.read())

中文步骤

  1. Add UTF-8 BOM to the CSV file: echo -e '\xEF\xBB\xBF' > output.csv; cat original.csv >> output.csv; then open in Excel
  2. Use Python to convert CSV to Latin-1 encoding: with open('input.csv', 'r', encoding='utf-8') as f, open('output.csv', 'w', encoding='latin-1') as out: out.write(f.read())

Dead Ends

Common approaches that don't work:

  1. Adding UTF-8 BOM to the file without verifying Excel version compatibility 60% fail

    UTF-8 BOM may cause Excel to detect encoding correctly but adds invisible characters to first column header.

  2. Converting file to UTF-16 which Excel supports but causes other issues 75% fail

    This changes the data format and may break downstream systems expecting UTF-8.