Java byte[] to/from String conversion

不羁的心 提交于 2019-12-22 05:53:31

问题


Why does this junit test fail?

import org.junit.Assert;
import org.junit.Test;

import java.io.UnsupportedEncodingException;

public class TestBytes {
    @Test
    public void testBytes() throws UnsupportedEncodingException {
        byte[] bytes = new byte[]{0, -121, -80, 116, -62};
        String string = new String(bytes, "UTF-8");
        byte[] bytes2 = string.getBytes("UTF-8");
        System.out.print("bytes2: [");
        for (byte b : bytes2) System.out.print(b + ", ");
        System.out.print("]\n");
        Assert.assertArrayEquals(bytes, bytes2);
    }
}

I would assume that the incoming byte array equaled the outcome, but somehow, probably due to the fact that UTF-8 characters take two bytes, the outcome array differs from the incoming array in both content and length.

Please enlighten me.


回答1:


The reason is 0, -121, -80, 116, -62 is not a valid UTF-8 byte sequence. new String(bytes, "UTF-8") does not throw any exception in such situations but the result is difficult to predict. Read http://en.wikipedia.org/wiki/UTF-8 Invalid byte sequences section.




回答2:


The array bytes contains negative noted vales, these have the 8th bit (bit7) set and are converted into UTF-8 as multibyte sequences. bytes2 will be identical to bytes if you use only bytes with values in range 0..127. To make a copy of bytes as given one may use for example the arraycopy method:

    byte[] bytes3 = new byte[bytes.length];
    System.arraycopy(bytes, 0, bytes3, 0, bytes.length);


来源:https://stackoverflow.com/questions/16232023/java-byte-to-from-string-conversion

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!