setting a UTF-8 in java and csv file [duplicate]

戏子无情 提交于 2019-11-27 01:07:52

Unfortunately, CSV is a very ad hoc format with no metadata and no real standard that would mandate a flexible encoding. As long as you use CSV, you can't reliably use any characters outside of ASCII.

Your alternatives:

  • Write to XML (which does have encoding metadata if you do it right) and have the users import the XML into Excel.
  • Use Apache POI to create actual Excel documents.
AlexR

I spent some time but found solution for your problem.

First I opened notepad and wrote the following line: שלום, hello, привет Then I saved it as file he-en-ru.csv using UTF-8. Then I opened it with MS excel and everything worked well.

Now, I wrote a simple java program that prints this line to file as following:

    PrintWriter w = new PrintWriter(new OutputStreamWriter(os, "UTF-8"));
    w.print(line);
    w.flush();
    w.close();

When I opened this file using excel I saw "gibrish."

Then I tried to read content of 2 files and (as expected) saw that file generated by notepad contains 3 bytes prefix:

    239 EF
    187 BB
    191 BF

So, I modified my code to print this prefix first and the text after that:

    String line = "שלום, hello, привет";
    OutputStream os = new FileOutputStream("c:/temp/j.csv");
    os.write(239);
    os.write(187);
    os.write(191);

    PrintWriter w = new PrintWriter(new OutputStreamWriter(os, "UTF-8"));

    w.print(line);
    w.flush();
    w.close();

And it worked! I opened the file using excel and saw text as I expected.

Bottom line: write these 3 bytes before writing the content. This prefix indicates that the content is in 'UTF-8 with BOM' (otherwise it is just 'UTF-8 without BOM').

Excel doesn't use UTF8 to open CSV files. Thats a known problem. The actual encoding used depends on the locale settings of Microsoft Windows. With a German lcoale for example Excel would open a CSV file with CP1252.

You could create an Excel file containing some persian characters and save it as an CSV file. Then write a small Java program to read this file and test some common encodings. Thats the way I used to figure out the correct encoding for German umlauts in CSV files.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!