Byte array to String and back.. issues with -127

前端 未结 4 1742
野的像风
野的像风 2020-12-08 14:20

In the following:

 scala> (new String(Array[Byte](1, 2, 3, -1, -2, -127))).getBytes
 res12: Array[Byte] = Array(1, 2, 3, -1, -2, 63)

why

相关标签:
4条回答
  • 2020-12-08 14:29

    The constructor you're calling makes it non-obvious that binary-to-string conversions use a decoding: String(byte[] bytes, Charset charset). What you want is to use no decoding at all.

    Fortunately, there's a constructor for that: String(char[] value).

    Now you have the data in a string, but you want it back exactly as is. But guess what! getBytes(Charset charset) That's right, there's an encoding applied automatically also. Fortunately, there is a toCharArray() method.

    If you must start with bytes and end with bytes, you then have to map the char arrays to bytes:

    (new String(Array[Byte](1,2,3,-1,-2,-127).map(_.toChar))).toCharArray.map(_.toByte)
    

    So, to summarize: converting between String and Array[Byte] involves encoding and decoding. If you want to put binary data in a string, you have to do it at the level of characters. Note, however, that this will give you a garbage string (i.e. the result will not be well-formed UTF-16, as String is expected to be), and so you'd better read it out as characters and convert it back to bytes.

    You could shift the bytes up by, say, adding 512; then you'd get a bunch of valid single Char code points. But this is using 16 bits to represent every 8, a 50% encoding efficiency. Base64 is a better option for serializing binary data (8 bits to represent 6, 75% efficient).

    0 讨论(0)
  • 2020-12-08 14:32

    StringOps has a method getBytes, I think that is probably what one actually wants for converting String to Array[Byte]

    http://www.scala-lang.org/api/2.10.2/index.html#scala.collection.immutable.StringOps

    0 讨论(0)
  • 2020-12-08 14:46

    String is for storing text not binary data.

    In your default character encoding there is no charcter for -127 so it replaces it with '?' or 63.

    EDIT: Base64 is the best option, even better would be to not use text to store binary data. It can be done, but not with any standard character encoding. i.e. you have to do the encoding yourself.

    To answer your question literally, you can use your own character encoding. This is a very bad idea as any text is likely to get encoded and mangled in the same way as you have seen. Using Base64 avoids this by using characters which are safe in any encoding.

    byte[] bytes = new byte[256];
    for (int i = 0; i < bytes.length; i++)
        bytes[i] = (byte) i;
    String text = new String(bytes, 0);
    byte[] bytes2 = new byte[text.length()];
    for (int i = 0; i < bytes2.length; i++)
        bytes2[i] = (byte) text.charAt(i);
    int count = 0;
    for (int i = 0; i < bytes2.length; i++)
        if (bytes2[i] != (byte) i)
            System.out.println(i);
        else
            count++;
    System.out.println(count + " bytes matched.");
    
    0 讨论(0)
  • 2020-12-08 14:48

    Use correct charset:

    scala> (new String(Array[Byte](1, 2, 3, -1, -2, -127), "utf-16")).getBytes("utf-16")
    res13: Array[Byte] = Array(-2, -1, 1, 2, 3, -1, -2, -127)
    
    0 讨论(0)
提交回复
热议问题