Converting UTF-8 to ISO-8859-1 in Java - how to keep it as single byte

匿名 (未验证) 提交于 2019-12-03 02:05:01

问题:

回答1:

If you're dealing with character encodings other than UTF-16, you shouldn't be using java.lang.String or the char primitive -- you should only be using byte[] arrays or ByteBuffer objects. Then, you can use java.nio.charset.Charset to convert between encodings:

Charset utf8charset = Charset.forName("UTF-8"); Charset iso88591charset = Charset.forName("ISO-8859-1");  ByteBuffer inputBuffer = ByteBuffer.wrap(new byte[]{(byte)0xC3, (byte)0xA2});  // decode UTF-8 CharBuffer data = utf8charset.decode(inputBuffer);  // encode ISO-8559-1 ByteBuffer outputBuffer = iso88591charset.encode(data); byte[] outputData = outputBuffer.array();


回答2:

byte[] iso88591Data = theString.getBytes("ISO-8859-1");

Will do the trick. From your description it seems as if you're trying to "store an ISO-8859-1 String". String objects in Java are always implicitely encoded in UTF-16. There's no way to change that encoding.

What you can do, 'though is to get the bytes that constitute some other encoding of it (using the .getBytes() method as shown above).



回答3:

Starting with a set of bytes which encode a string using UTF-8, creates a string from that data, then get some bytes encoding the string in a different encoding:

    byte[] utf8bytes = { (byte)0xc3, (byte)0xa2, 0x61, 0x62, 0x63, 0x64 };     Charset utf8charset = Charset.forName("UTF-8");     Charset iso88591charset = Charset.forName("ISO-8859-1");      String string = new String ( utf8bytes, utf8charset );      System.out.println(string);      // "When I do a getbytes(encoding) and "     byte[] iso88591bytes = string.getBytes(iso88591charset);      for ( byte b : iso88591bytes )         System.out.printf("%02x ", b);      System.out.println();      // "then create a new string with the bytes in ISO-8859-1 encoding"     String string2 = new String ( iso88591bytes, iso88591charset );      // "I get a two different chars"     System.out.println(string2);

this outputs strings and the iso88591 bytes correctly:

So your byte array wasn't paired with the correct encoding:

    String failString = new String ( utf8bytes, iso88591charset );      System.out.println(failString);

Outputs

(either that, or you just wrote the utf8 bytes to a file and read them elsewhere as iso88591)



回答4:

This is what I needed:

public static byte[] encode(byte[] arr, String fromCharsetName) {     return encode(arr, Charset.forName(fromCharsetName), Charset.forName("UTF-8")); }  public static byte[] encode(byte[] arr, String fromCharsetName, String targetCharsetName) {     return encode(arr, Charset.forName(fromCharsetName), Charset.forName(targetCharsetName)); }  public static byte[] encode        
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!