I am working on translating a system from python to c++. I need to be able to perform actions in c++ that are generally performed by using Python\'s struct.unpack
If your as received values are truly strings, (char* or std::string) and you know their format information, sscanf(), and atoi(), well, really ato() will be your friends. They take well formatted strings and convert them per passed-in formats (kind of reverse printf).
This falls in the realm of bit twiddling.
for (i=0;i<sizeof(struct foo);i++) dst[i] = src[i ^ mask];
where mask == (sizeof type -1) if the stored and native endianness differ.
With this technique one can convert a struct to bit masks:
struct foo {
byte a,b; // mask = 0,0
short e; // mask = 1,1
int g; // mask = 3,3,3,3,
double i; // mask = 7,7,7,7,7,7,7,7
} s; // notice that all units must be aligned according their native size
Again these masks can be encoded with two bits per symbol: (1<<n)-1
, meaning that in 64-bit machines one can encode necessary masks of a 32 byte sized struct in a single constant (with 1,2,4 and 8 byte alignments).
unsigned int mask = 0xffffaa50; // or zero if the endianness matches
for (i=0;i<16;i++) {
dst[i]=src[i ^ ((1<<(mask & 3))-1]; mask>>=2;
}
First, the cast you're doing:
char *str = ...;
int32_t i = *(int32_t*)str;
results in undefined behavior due to the strict aliasing rule (unless str
is initialized with something like int32_t x; char *str = (char*)&x;
). In practical terms that cast can result in an unaligned read which causes a bus error (a crash) on some platforms and slow performance on others.
Instead you should be doing something like:
int32_t i;
std::memcpy(&i, c, sizeof(i));
There are a number of functions for swapping bytes between the host's native byte ordering and a host independent ordering: ntoh*()
, hton*()
, where *
is nothing, l
, or s
for the different types supported. Since different hosts may have different byte orderings then this may be what you want to use if the data you're reading uses a consistent serialized form on all platforms.
ntoh(i);
You can also manually move bytes around in str
before copying it into the integer.
std::swap(str[0],str[3]);
std::swap(str[1],str[2]);
std::memcpy(&i,str,sizeof(i));
Or you can manually manipulate the integer's value using shifts and bitwise operators.
std::memcpy(&i,str,sizeof(i));
i = (i&0xFFFF0000)>>16 | (i&0x0000FFFF)<<16;
i = (i&0xFF00FF00)>>8 | (i&0x00FF00FF)<<8;
For 32 and 16-bit values:
This is exactly the problem you have for network data, which is big-endian. You can use the the ntohl to turn a 32-bit into host order, little-endian in your case.
The ntohl() function converts the unsigned integer netlong from network byte order to host byte order.
int res = ntohl(*((int32_t) str)));
This will also take care of the case where your host is big-endian and won't do anything.
For 64-bit values
Non-standardly on linux/BSD you can take a look at 64 bit ntohl() in C++?, which points to htobe64
These functions convert the byte encoding of integer values from the byte order that the current CPU (the "host") uses, to and from little-endian and big-endian byte order.
For windows try: How do I convert between big-endian and little-endian values in C++?
Which points to _byteswap_uint64 and as well as a 16 and 32-bit solution and a gcc-specific __builtin_bswap(32/64) call.
Other Sizes
Most systems don't have values that aren't 16/32/64 bits long. At that point I might try to store it in a 64-bit value, shift it and they translate. I'd write some good tests. I suspectt is an uncommon situation and more details would help.
Unpack the string one byte at a time.
unsigned char *str;
unsigned int result;
result = *str++ << 24;
result |= *str++ << 16;
result |= *str++ << 8;
result |= *str++;