I\'m curious about conventions for type-punning pointers/arrays in C++. Here\'s the use case I have at the moment:
Compute a simple 32-bit checksum over a
Ignoring efficiency, for simplicity of code I'd do:
#include
#include
#include
uint32_t compute_checksum(const char *data, size_t size) {
std::vector intdata(size/sizeof(uint32_t));
std::memcpy(&intdata[0], data, size);
return std::accumulate(intdata.begin(), intdata.end(), 0);
}
I also like litb's last answer, the one that shifts each char in turn, except that since char might be signed, I think it needs an extra mask:
checksum += ((data[i] && 0xFF) << shift[i % 4]);
When type punning is a potential issue, I prefer not to type pun rather than to try to do so safely. If you don't create any aliased pointers of distinct types in the first place, then you don't have to worry what the compiler might do with aliases, and neither does the maintenance programmer who sees your multiple static_casts through a union.
If you don't want to allocate so much extra memory, then:
uint32_t compute_checksum(const char *data, size_t size) {
uint32_t total = 0;
for (size_t i = 0; i < size; i += sizeof(uint32_t)) {
uint32_t thisone;
std::memcpy(&thisone, &data[i], sizeof(uint32_t));
total += thisone;
}
return total;
}
Enough optimisation will get rid of the memcpy and the extra uint32_t variable entirely on gcc, and just read an integer value unaligned, in whatever the most efficient way to do that is on your platform, straight out of the source array. I'd hope the same is true of other "serious" compilers. But this code is now bigger than litb's, so there's not much to be said for it other than mine is easier to turn into a function template that will work just as well with uint64_t, and mine works as native endian-ness rather than picking little-endian.
This is of course not completely portable. It assumes that the storage representation of sizeof(uint32_t) chars corresponds to the storage representation of a uin32_t in the way we want. This is implied by the question, since it states that one can be "treated as" the other. Endian-ness, whether a char is 8 bits, and whether uint32_t uses all bits in its storage representation can obviously intrude, but the question implies that they won't.