Are XLSX files UTF-8 encoded by definition?

情到浓时终转凉″ 提交于 2020-12-05 10:29:47

问题


I'm trying to read in XLSX files with PHP. Using gneustaetter/XLSXReader to be exact. However, these XLSX-files are generated by different companies, using different software. So I wanted to check if they have the right encoding and always just found UTF-8.

Therefore my question as above: Are XLSX files UTF-8 encoded by definition? Or are there exceptions that could break the import script I'm working on?


回答1:


It'd be risky to presume it's always UTF-8. I'd just key your expectations to what the XML describes in the XML header. In my experience Windows-1252 encoded data shows up all the time when you least expect it. You might check the XLSX specification more closely to find out more.

Here's a Chromium bug relating to a Windows-1252 encoded XLSX file, so these seem to exist in the wild. Maybe they're produced by programs other than Microsoft Office. With things like LibreOffice becoming more popular, older versions that may not have had the most robust XLSX support might end up interacting with your code. You probably don't want to have a bug like this show up in your code.

Try and be as accommodating as possible unless you have a concrete reason for rejecting invalid encoding. JSON, by strict definition, is UTF-8. XLSX seems to be XML by definition, but the encoding is not as nailed down. UTF-8 simply seems to be the default convention.



来源:https://stackoverflow.com/questions/45194771/are-xlsx-files-utf-8-encoded-by-definition

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!