On 32-bit CPUs, is an 'integer' type more efficient than a 'short' type?

前端 未结 8 1594
离开以前
离开以前 2020-11-29 09:30

On a 32-bit CPU, an integer is 4 bytes and a short integer is 2 bytes. If I am writing a C/C++ application that uses many numeric values that will always fit within the pro

8条回答
  •  粉色の甜心
    2020-11-29 10:13

    A 32 bit CPU is a CPU that usually operates on 32 bit values internally, yet that does not mean that it is any slower when performing the same operation on a 8/16 bit value. x86 for example, still backward compatible up to the 8086, can operate on fractions of a register. That means even if a register is 32 bit wide, it can operate only on the first 16 or the first 8 bit of that register and there will be no slow down at all. This concept has even been adopted by x86_64, where the registers are 64 bit, yet they still can operate only on the first 32, 16, or 8 bit.

    Also x86 CPUs always load a whole cache line from memory, if not already in cache, and a cache line is bigger than 4 byte anyway (for 32 bit CPUs rather 8 or 16 bytes) and thus loading 2 byte from memory is equally fast as loading 4 byte from memory. If processing many values from memory, 16 bit values may actually be much faster than 32 bit values, since there are less memory transfers. If a cache line is 8 byte, there are four 16 bit values per cache line, yet only two 32 bit values, thus when using 16 bit ints you have one memory access every four values, using 32 bit ints you have one every two values, resulting in twice as many transfers for processing a large int array.

    Other CPUs, like PPC for example, cannot process only a fraction of a register, they always process the full register. Yet these CPUs usually have special load operations that allow them to, e.g. load a 16 bit value from memory, expand it to 32 bit and write it to a register. Later on they have a special store operation that takes the value from the register and only stores the last 16 bit back to memory; both operation need only one CPU cycle, just like a 32 bit load/store would need, so there is no speed difference either. And since PPC can only perform arithmetic operations on registers (unlike x86, which can also operate on memory directly), this load/store procedure takes place anyway whether you use 32 bit ints or 16 bit ints.

    The only disadvantage, if you chain multiple operations on a 32 bit CPU that can only operate on full registers, is that the 32 bit result of the last operation may have to be "cut back" to 16 bit before the next operation is performed, otherwise the result may not be correct. Such a cut back is only a single CPU cycle, though (a simple AND operation), and compilers are very good at figuring out when such a cut back is really necessary and when leaving it out won't have any influence on the final result, so such a cut back is not performed after every instruction, it is only performed if really unavoidable. Some CPUs offers various "enhanced" instructions which make such a cut back unnecessary and I've seen plenty of code in my life, where I had expected such a cut back, yet looking at the generated assembly code, the compiler found a way to avoid it entirely.

    So if you expect a general rule here, I'll have to disappoint you. Neither can one say for sure that 16 bit operations are equally fast to 32 bit operations, nor can anyone say for sure that 32 bit operations will always be faster. It depends also what exactly your code is doing with those numbers and how it is doing that. I've seen benchmarks where 32 bit operations were faster on certain 32 bit CPUs than the same code with 16 bit operations, however I also already saw the opposite being true. Even switching from one compiler to another one or upgrading your compiler version may already turn everything around again. I can only say the following: Whoever claims that working with shorts is significantly slower than working with ints, shall please provide a sample source code for that claim and name CPU and compiler he used for testing, since I have never experienced anything like that within about the past 10 years. There may be some situations, where working with ints is maybe 1-5% faster, yet anything below 10% is not "significant" and the question is, is it worth to waste twice the memory in some cases only because it may buy you 2% performance? I don't think so.

提交回复
热议问题