dealing with endianness in c++

非 Y 不嫁゛ 提交于 2019-12-18 04:26:16

问题


I am working on translating a system from python to c++. I need to be able to perform actions in c++ that are generally performed by using Python's struct.unpack (interpreting binary strings as numerical values). For integer values, I am able to get this to (sort of) work, using the data types in stdint.h:

struct.unpack("i", str) ==> *(int32_t*) str; //str is a char* containing the data

This works properly for little-endian binary strings, but fails on big-endian binary strings. Basically, I need an equivalent to using the > tag in struct.unpack:

struct.unpack(">i", str) ==> ???

Please note, if there is a better way to do this, I am all ears. However, I cannot use c++11, nor any 3rd party libraries other than Boost. I will also need to be able to interpret floats and doubles, as in struct.unpack(">f", str) and struct.unpack(">d", str), but I'll get to that when I solve this.

NOTE I should point out that the endianness of my machine is irrelevant in this case. I know that the bitstream I receive in my code will ALWAYS be big-endian, and that's why I need a solution that will always cover the big-endian case. The article pointed out by BoBTFish in the comments seems to offer a solution.


回答1:


Unpack the string one byte at a time.

unsigned char *str;
unsigned int result;

result =  *str++ << 24;
result |= *str++ << 16;
result |= *str++ << 8;
result |= *str++;



回答2:


For 32 and 16-bit values:

This is exactly the problem you have for network data, which is big-endian. You can use the the ntohl to turn a 32-bit into host order, little-endian in your case.

The ntohl() function converts the unsigned integer netlong from network byte order to host byte order.

int res = ntohl(*((int32_t) str)));

This will also take care of the case where your host is big-endian and won't do anything.

For 64-bit values

Non-standardly on linux/BSD you can take a look at 64 bit ntohl() in C++?, which points to htobe64

These functions convert the byte encoding of integer values from the byte order that the current CPU (the "host") uses, to and from little-endian and big-endian byte order.

For windows try: How do I convert between big-endian and little-endian values in C++?

Which points to _byteswap_uint64 and as well as a 16 and 32-bit solution and a gcc-specific __builtin_bswap(32/64) call.

Other Sizes

Most systems don't have values that aren't 16/32/64 bits long. At that point I might try to store it in a 64-bit value, shift it and they translate. I'd write some good tests. I suspectt is an uncommon situation and more details would help.




回答3:


First, the cast you're doing:

char *str = ...;
int32_t i = *(int32_t*)str;

results in undefined behavior due to the strict aliasing rule (unless str is initialized with something like int32_t x; char *str = (char*)&x;). In practical terms that cast can result in an unaligned read which causes a bus error (a crash) on some platforms and slow performance on others.

Instead you should be doing something like:

int32_t i;
std::memcpy(&i, c, sizeof(i));

There are a number of functions for swapping bytes between the host's native byte ordering and a host independent ordering: ntoh*(), hton*(), where * is nothing, l, or s for the different types supported. Since different hosts may have different byte orderings then this may be what you want to use if the data you're reading uses a consistent serialized form on all platforms.

ntoh(i);

You can also manually move bytes around in str before copying it into the integer.

std::swap(str[0],str[3]);
std::swap(str[1],str[2]);
std::memcpy(&i,str,sizeof(i));

Or you can manually manipulate the integer's value using shifts and bitwise operators.

std::memcpy(&i,str,sizeof(i));
i = (i&0xFFFF0000)>>16 | (i&0x0000FFFF)<<16;
i = (i&0xFF00FF00)>>8  | (i&0x00FF00FF)<<8;



回答4:


This falls in the realm of bit twiddling.

for (i=0;i<sizeof(struct foo);i++) dst[i] = src[i ^ mask]; 

where mask == (sizeof type -1) if the stored and native endianness differ.

With this technique one can convert a struct to bit masks:

 struct foo {
    byte a,b;       //  mask = 0,0
    short e;        //  mask = 1,1
    int g;          //  mask = 3,3,3,3,
    double i;       //  mask = 7,7,7,7,7,7,7,7
 } s; // notice that all units must be aligned according their native size

Again these masks can be encoded with two bits per symbol: (1<<n)-1, meaning that in 64-bit machines one can encode necessary masks of a 32 byte sized struct in a single constant (with 1,2,4 and 8 byte alignments).

unsigned int mask = 0xffffaa50;  // or zero if the endianness matches
for (i=0;i<16;i++) { 
     dst[i]=src[i ^ ((1<<(mask & 3))-1]; mask>>=2;
}



回答5:


If your as received values are truly strings, (char* or std::string) and you know their format information, sscanf(), and atoi(), well, really ato() will be your friends. They take well formatted strings and convert them per passed-in formats (kind of reverse printf).



来源:https://stackoverflow.com/questions/13863667/dealing-with-endianness-in-c

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!