Convert String to/from byte array without encoding

喜你入骨 提交于 2019-11-29 15:00:24

Here is a sample code which will convert String to byte array and back to String without encoding.

public class Test
{

    public static void main(String[] args)
    {
        Test t = new Test();
        t.Test();
    }

    public void Test()
    {
        String input = "Hèllo world";
        byte[] inputBytes = GetBytes(input);
        String output = GetString(inputBytes);
        System.out.println(output);
    }

    public byte[] GetBytes(String str)
    {
        char[] chars = str.toCharArray();
        byte[] bytes = new byte[chars.length * 2];
        for (int i = 0; i < chars.length; i++)
        {
            bytes[i * 2] = (byte) (chars[i] >> 8);
            bytes[i * 2 + 1] = (byte) chars[i];
        }

        return bytes;
    }

    public String GetString(byte[] bytes)
    {
        char[] chars = new char[bytes.length / 2];
        char[] chars2 = new char[bytes.length / 2];
        for (int i = 0; i < chars2.length; i++)
            chars2[i] = (char) ((bytes[i * 2] << 8) + (bytes[i * 2 + 1] & 0xFF));

        return new String(chars2);

    }
}
Simon Laburda

This will convert a byte array to a String while only filling the upper 8 bits.

public static String stringFromBytes(byte byteData[]) {
    char charData[] = new char[byteData.length];
    for(int i = 0; i < charData.length; i++) {
        charData[i] = (char) (((int) byteData[i]) & 0xFF);
    }
    return new String(charData);
}

The efficiency should be quite good. Like Ben Thurley said, if performance is really such an issue don't convert to a String in the first place but work with the byte array instead.

No, you aren't missing anything. There is no easy way to do that because String and char are for text. You apparently don't want to handle your data as text—which would make complete sense if it isn't text. You could do it the hard way that you propose.

An alternative is to assume a character encoding that allows arbitrary sequences of arbitrary byte values (0-255). ISO-8859-1 or IBM437 both qualify. (Windows-1252 only has 251 codepoints. UTF-8 doesn't allow arbitrary sequences.) If you use ISO-8859-1, the resulting string will be the same as your hard way.

As for efficiency, the most efficient way to handle an array of bytes is to keep it as an array of bytes.

Using deprecated constructor String(byte[] ascii, int hibyte)

String string = new String(byteArray, 0);
ScottK

String is already encoded as Unicode/UTF-16. UTF-16 means that it can take up to 2 string "characters"(char) to make one displayable character. What you really want is to use is:

byte[] bytes = System.Text.Encoding.Unicode.GetBytes(myString); 

to convert a String to an array of bytes. This does exactly what you did above except it is 10 times faster in performance. If you would like to cut the transmission data nearly in half, I would recommend converting it to UTF8 (ASCII is a subset of UTF8) - the format the internet uses 90% of the time, by calling:

byte[] bytes = Encoding.UTF8.GetBytes(myString);

To convert back to a string use:

String myString = Encoding.Unicode.GetString(bytes); 

or

String myString = Encoding.UTF8.GetString(bytes);
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!