String to binary and vice versa: extended ASCII

隐身守侯 提交于 2019-12-10 11:28:57

问题


I want to convert a String to binary by putting it in a byte array (String.getBytes[]) and then store the binary string for each byte (Integer.toBinaryString(bytearray)) in a String[]. Then I want to convert back to normal String via Byte.parseByte(stringarray[i], 2). This works great for standard ASCII-Table, but not for the extended one. For example, an A gives me 1000001, but an Ä returns

11111111111111111111111111000011
11111111111111111111111110000100

Any ideas how to manage this?

public class BinString {
    public static void main(String args[]) {
        String s = "ä";
        System.out.println(binToString(stringToBin(s)));

    }

    public static String[] stringToBin(String s) {
        System.out.println("Converting: " + s);
        byte[] b = s.getBytes();
        String[] sa = new String[s.getBytes().length];
        for (int i = 0; i < b.length; i++) {
            sa[i] = Integer.toBinaryString(b[i] & 0xFF);
        }
        return sa;
    }

    public static String binToString(String[] strar) {
        byte[] bar = new byte[strar.length];
        for (int i = 0; i < strar.length; i++) {
            bar[i] = Byte.parseByte(strar[i], 2);
            System.out.println(Byte.parseByte(strar[i], 2));

        }
        String s = new String(bar);
        return s;
    }

}

回答1:


First off: "extended ASCII" is a very misleading title that's used to refer to a ton of different encodings.

Second: byte in Java is signed, while bytes in encodings are usually handled as unsigned. Since you use Integer.toBinaryString() the byte will be converted to an int using sign extension (because byte values > 127 will be represented by negative values in Java).

To avoid this simply use & 0xFF to mask all but the lower 8 bit like this:

String binary = Integer.toBinaryString(byteArray[i] & 0xFF);



回答2:


To expand on Joachim's point about "extended ASCII" I'd add...

Note that getBytes() is a transcoding operation that converts data from UTF-16 to the platform default encoding. The encoding varies from system to system and sometimes even between users on the same PC. This means that results are not consistent on all platforms and if a legacy encoding is the default (as it is on Windows) that data can be lost.

To make the operation symmetrical, you need to provide an encoding explicitly (preferably a Unicode encoding such as UTF-8 or UTF-16.)

Charset encoding = Charset.forName("UTF-16");
byte[] b = s1.getBytes(encoding);
String s2 = new String(b, encoding);
assert s1.equals(s2);


来源:https://stackoverflow.com/questions/5535988/string-to-binary-and-vice-versa-extended-ascii

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!