getBytes() doesn't work for Cyrillic letters

爷,独闯天下 提交于 2019-12-25 16:19:14

问题


I found some answers but none of them works for me. I want to make a pdf file from a html, but the problem is that my html has Cyrilic letters and I found that there's something to do with this simple code:

String s = "Здраво Kris";

byte bytes[] = s.getBytes("UTF-8");

String value = new String(bytes, "ISO-8859-1");

// I tried with new String(bytes, "UTF-8") but it didn't work

Then I pass the value to my pdf generator function but it outputs only the part from the string s that is not in Cyrilic, i.e. Kris

 htp.CreatePDF("<html><head><title>kristijan</title></head><body><h1>" + value + "</h1></body></html>", "kris");

回答1:


Please take a look at my answer to this question: Can't get Czech characters while generating a PDF

Several things can go wrong in your code.

This is a very bad idea:

String s = "Здраво Kris";

Suppose that you send your .java file including this code to somebody who saves it as ASCII, then your source code will change into this:

String s = "Здраво Kris";

I've also seen this happen when storing a document into a source control system.

Bottom line: never use special encodings when writing source code with hard-coded strings. Either store the strings in a file using the right encoding to write and read the string, or use the unicode notation if you insist on having hard-coded data in your source code.

Even if you store the file containing this string correctly, you have to be very careful when compiling the code. If the compiler uses a different encoding, s will be corrupted too.

You also have to make sure that you're reading the data correctly when converting the HTML to PDF. I assume that you are using XML Worker (and not the obsolete HTMLWorker class). There are different places where you can indicate which encoding to use.

Finally, you have to make sure that you use a font that supports Cyrillic characters. For instance: if you use the default font Helvetica, nothing will be rendered.

You can also find this information in the free ebook The Best iText Questions on StackOverflow.




回答2:


One way to get around the inability (?) of createPDF to handle full unicode range of characters in Java (!) would be to investigate the

String s = "Здраво Kris";

for characters greater than 0x80. These must be replaced by the corresponding numeric HTML entity.

You can easily verify this by setting the String s to these entities and see what happens if this string is embedded.



来源:https://stackoverflow.com/questions/27685144/getbytes-doesnt-work-for-cyrillic-letters

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!