How to add a UTF-8 BOM in java

安稳与你 提交于 2019-11-26 05:36:48

问题


I have a Java stored procedure which fetches record from the table using Resultset object and creates a csv file.

BLOB retBLOB = BLOB.createTemporary(conn, true, BLOB.DURATION_SESSION);
retBLOB.open(BLOB.MODE_READWRITE);
OutputStream bOut = retBLOB.setBinaryStream(0L);
ZipOutputStream zipOut = new ZipOutputStream(bOut);
PrintStream out = new PrintStream(zipOut,false,\"UTF-8\");
out.write(\'\\ufeff\');
out.flush();
zipOut.putNextEntry(new ZipEntry(\"filename.csv\"));
while (rs.next()){
    out.print(\"\\\"\" + rs.getString(i) + \"\\\"\");
    out.print(\",\");
}
out.flush();
zipOut.closeEntry();
zipOut.close();
retBLOB.close();
return retBLOB;

But the generated csv file doesn\'t show the correct german character. Oracle database also has a NLS_CHARACTERSET value of UTF8.

Please suggest.


回答1:


To write a BOM in UTF-8 you need PrintStream.print(), not PrintStream.write().

Also if you want to have BOM in your csv file, I guess you need to print a BOM after putNextEntry().




回答2:


BufferedWriter out = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(...), StandardCharsets.UTF_8));
out.write('\ufeff');
out.write(...);

This correctly writes out 0xEF 0xBB 0xBF to the file, which is the UTF-8 representation of the BOM.




回答3:


Just in case people are using PrintStreams, you need to do it a little differently. While a Writer will do some magic to convert a single byte into 3 bytes, a PrintStream requires all 3 bytes of the UTF-8 BOM individually:

    // Print utf-8 BOM
    PrintStream out = System.out;
    out.write('\ufeef'); // emits 0xef
    out.write('\ufebb'); // emits 0xbb
    out.write('\ufebf'); // emits 0xbf

Alternatively, you can use the hex values for those directly:

    PrintStream out = System.out;
    out.write(0xef); // emits 0xef
    out.write(0xbb); // emits 0xbb
    out.write(0xbf); // emits 0xbf



回答4:


I think that out.write('\ufeff'); should actually be out.print('\ufeff');.

According the javadoc, the write(int) method actually writes a byte ... without any character encoding. So out.write('\ufeff'); writes the byte 0xff. By contrast, the print(char) method encodes the character as one or bytes using the stream's encoding, and then writes those bytes.




回答5:


In my case it works with the code:

PrintWriter out = new PrintWriter(new File(filePath), "UTF-8");
out.write(csvContent);
out.flush();
out.close();


来源:https://stackoverflow.com/questions/4389005/how-to-add-a-utf-8-bom-in-java

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!