Convert 4 bytes to an unsigned 32-bit integer and storing it in a long

匿名 (未验证) 提交于 2019-12-03 08:54:24

问题:

I'm trying to read a binary file in Java. I need methods to read unsigned 8-bit values, unsigned 16-bit value and unsigned 32-bit values. What would be the best (fastest, nicest looking code) to do this? I've done this in c++ and did something like this:

uint8_t *buffer; uint32_t value = buffer[0] | buffer[1] << 8 | buffer[2] << 16 | buffer[3] << 24; 

But in Java this causes a problem if for example buffer[1] contains a value which has it sign bit set as the result of a left-shift is an int (?). Instead of OR:ing in only 0xA5 at the specific place it OR:s in 0xFFFFA500 or something like that, which "damages" the two top bytes.

I have a code right now which looks like this:

public long getUInt32() throws EOFException, IOException {     byte[] bytes = getBytes(4);     long value = bytes[0] | (bytes[1] << 8) | (bytes[2] << 16) | (bytes[3] << 24);     return value & 0x00000000FFFFFFFFL; } 

If I want to convert the four bytes 0x67 0xA5 0x72 0x50 the result is 0xFFFFA567 instead of 0x5072A567.

Edit: This works great:

public long getUInt32() throws EOFException, IOException {     byte[] bytes = getBytes(4);     long value = bytes[0] & 0xFF;     value |= (bytes[1] << 8) & 0xFFFF;     value |= (bytes[2] << 16) & 0xFFFFFF;     value |= (bytes[3] << 24) & 0xFFFFFFFF;     return value; } 

But isn't there a better way to do this? 10 bit-operations seems a "bit" much for a simple thing like this.. (See what I did there?) =)

回答1:

You've got the right idea, I don't think there's any obvious improvement. If you look at the java.io.DataInput.readInt spec, they have code for the same thing. They switch the order of << and &, but otherwise standard.

There is no way to read an int in one go from a byte array, unless you use a memory-mapped region, which is way overkill for this.

Of course, you could use a DataInputStream directly instead of reading into a byte[] first:

DataInputStream d = new DataInputStream(new FileInputStream("myfile")); d.readInt(); 

DataInputStream works on the opposite endianness than you are using, so you'll need some Integer.reverseBytes calls also. It won't be any faster, but it's cleaner.



回答2:

The problem with your sample code is that when you convert implicitly from byte to long, does so with sign extension, which means if the first bit of the byte is 1, it pads the long with one instead of zero. By using a conversion to long that prevents sign extension, your code can work perfectly.

public static long byteAsULong(byte b) {     return ((long)b) & 0x00000000000000FFL;  }  public static long getUInt32(byte[] bytes) {     long value = byteAsULong(bytes[0]) | (byteAsULong(bytes[1]) << 8) | (byteAsULong(bytes[2]) << 16) | (byteAsULong(bytes[3]) << 24);     return value; } 

You can use the signed values to contain bits if you are careful. The things you need to avoid are any form or signed operations, such as arithmetic, and signed bit shifting. If you need to print the values out as numbers, realize that all of the built in java ways to do it will result in large unsigned numbers appearing negative.

The most important thing to know of all though, is about bit shifting. When shifting right, the >> operator will maintain the sign of the number in two's compliment. this means if the leftmost bit is a 1, the bits shifted in will be ones instead of zeros. The good news is that Java at least has an unsigned bit shifting operator, which will always shift in zeros, it is >>>. Example:

int bits; bits >>> 4; 

Always remember that the data a pile of bits express is arbitrary. Even though Java's internal methods all treat the bits as two's compliment, if you do not use any of them, the signed bytes contain the exact same bits that you put into them.



回答3:

A more regular version converts the bytes to their unsigned values as integers first:

public long getUInt32() throws EOFException, IOException {     byte[] bytes = getBytes(4);     long value =          ((bytes[0] & 0xFF) <<  0) |         ((bytes[1] & 0xFF) <<  8) |         ((bytes[2] & 0xFF) << 16) |         ((bytes[3] & 0xFF) << 24);     return value; } 

Don't get hung up on the number of bit operations, most likely the compiler will optimize those to byte operations.

Also, you shouldn't be using long for 32-bit values just to avoid the sign, you can use int and ignore the fact that it is signed most of the time. See this answer.



标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!