Character encoding in Excel spreadsheet (and what Java charset to use to decode it)

百般思念 提交于 2020-01-02 06:00:34

问题


I am using the JExcel library to read excel spreadsheets. Each cell on the spreadsheet may contain localization strings in any of something like 44 languages (English, Portugese, French, Chinese, etc). Today I don't tell the API anything regarding the encoding its supposed to use. Its handling the Chinese OK, but it always screws up Portugese and German. Somehow the default encoding (MacRoman on my dev box, UTF-8 on production) is failing to properly interpret the strings it pulls out of the excel workbook. There has to be something wrong with how JExcel is interpreting the character encoding of the file.

That being said...

Are all the strings in an excel workbook encoded with the same character set?

Is there workbook meta-data I can ask what this character set is (I haven't found it yet)?

If I run all the cells through something like jchardet (http://jchardet.sourceforge.net/), is it likely to be able to divine the character encoding for the whole workbook (this is pretty much predicated on the first question being "yes, all stings in a given workbook are encoded with the same character set")?

So many questions, so little time.


回答1:


Well I didn't get an answer directly, but Matt's discovery of a spec points the way towards an actual answer: http://sc.openoffice.org/excelfileformat.pdf

In the mean time, my problem went away by just setting the encoding to always be "Cp1252". I'm not sure exactly why, but I'm not looking a gift horse in the mouth, so to speak, and am moving on.

    WorkbookSettings workbookSettings = new WorkbookSettings();
    workbookSettings.setEncoding( "Cp1252" );
    Workbook.getWorkbook( theFile, workbookSettings );

I'll call this one answered.




回答2:


I have the problem that, while reading cell values from the excel file, some values appeared with "?" as this corresponds to letters with accent... Would that code resolve this issue ?. Because as I am running under windows, I cannot test as fast as If I would be under Linux (which is the SO of the server where I'm deploying to)...



来源:https://stackoverflow.com/questions/7449285/character-encoding-in-excel-spreadsheet-and-what-java-charset-to-use-to-decode

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!