Fastest integer type for common architectures

好久不见. 提交于 2019-11-30 06:49:28

int_fast8_t is always the fastest integer type in a correct implementation. There can never be integer types smaller than 8 bits (because CHAR_BIT>=8 is required), and since int_fast8_t is the fastest integer type with at least 8 bits, it's thus the fastest integer type, period.

Theoretically, int is the best bet. It should map to the CPU's native register size, and thus be "optimal" in the sense you're asking about.

However, you may still find that an int-64 or int-128 is faster on some CPUs than an int-32, because although these are larger than the register size, they will reduce the number of iterations of your loop, and thus may work out more efficient by minimising the loop overheads and/or taking advantage of DMA to load/store the data faster.

(For example, on ARM-2 processors it took 4 memory cycles to load one 32-bit register, but only 5 cycles to load two sequentially, and 7 cycles to load 4 sequentially. The routine you suggest above would be optimised to use as many registers as you could free up (8 to 10 usually), and could therefore run up to 3 or 4 times faster by using multiple registers per loop iteration)

The only way to be sure is to write several routines and then profile them on the specific target machine to find out which produces the best performance.

I'm not sure I really understand the question, but why aren't you just using int? Quoting from my (free draft copy of the wrong, i. e. C++) standard, "Plain ints have the natural size suggested by the architecture of the execution environment."

But I think that if you want to have the optimal integer type for a certain operation, it will be different depending on which operation it is. Trying to find the first bit in a large data buffer, or finding a number in a sequence of integers, or moving them around, could very well have completely different optimal types.

EDIT:

For whatever it's worth, I did a small benchmark. On my particular system (Intel i7 920 with Linux, gcc -O3) it turns out that long ints (64 bits) are quite a bit faster than plain ints (32 bits), on this particular example. I would have guessed the opposite.

If you want to be certain you've got the fastest implementation, why not benchmark each one on the systems you're expecting to run on instead of trying to guess?

The answer is int itself. At least in C++, where 3.9.1/2 of the standard says:

Plain ints have the natural size suggested by the architecture of the execution environment

I expect the same is true for C, though I don't have any of the standards documents.

I would guess that the types size_t (for an unsigned type) and ptrdiff_t (for a signed type) will usually correspond to quite efficient integer types on any given platform.

But nothing can prove that than inspecting the produced assembler and to do benchmarks.

Edit, including the different comments, here and in other replies:

size_t and ptrdiff_t are the only typedefs that are normative in C99 and for which one may make a reasonable assumption that they are related to the architecture.

There are 5 different possible ranks for standard integer types (char, short, int, long, long long). All the forces go towards having types of width 8, 16, 32, 64 and in near future 128. As a consequence int will be stuck on 32 bit. Its definition will have nothing to do with efficiency on the platform, but just be constrained by that width requirement.

If you're compiling with gcc, i'd recommend using __builtin_ffs() for finding the first bit set:

Built-in Function: int __builtin_ffs (unsigned int x) Returns one plus the index of the least significant 1-bit of x, or if x is zero, returns zero.

This will be compiled into (often a single) native assembly instruction.

It is not possible to answer this question since the question is incomplete. As an analogy, consider the question:

What is the fastest vehicle

A Bugatti Veyron? Certainly fast, but no good for going from London to New York.

What is missing from the question, is the context the integer will be used in. In the original example above, I doubt you'd see much difference between 8, 32 or 64 bit values if the array is large and sparse since you'll be hitting memory bandwidth limits before cpu limits.

The main point is, the architecture does not define what size the various integer types are, it's the compiler designer that does that. The designer will carefully weigh up the pros and cons for various sizes for each type for a given architecture and pick the most appropriate.

I guess the 32 bit int on the 64 bit system was chosen because for most operations ints are used for 32 bits are enough. Since memory bandwidth is a limiting factor, saving on memory use was probably the overriding factor.

For all existing mainstream architectures long is the fastest type at present for loop throughput.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!