On 32-bit CPUs, is an 'integer' type more efficient than a 'short' type?

前端未结

关注

 8  1593

离开以前 2020-11-29 09:30

On a 32-bit CPU, an integer is 4 bytes and a short integer is 2 bytes. If I am writing a C/C++ application that uses many numeric values that will always fit within the pro

8条回答

一整个雨季 (楼主)

2020-11-29 10:26

When you say 32bit, I'll assume you mean x86. 16 bit arithmetic is quite slow: the operand-size prefix makes decoding really slow. So don't make your temp variables short int or int16_t.

However, x86 can efficiently load 16 and 8 bit integers into 32 or 64 bit registers. (movzx / movsx: zero and sign extension). So feel free to use short int for arrays and struct fields, but make sure you use int or long for your temp variables.

However, if I am adding together two short integers, would the CPU package both values in a single pass in parallel (thus spanning the 4 byte bandwidth of the bus)?

That is nonsense. load/store instructions interact with L1 cache, and the limiting factor is number of ops; width is irrelevant. e.g. on core2: 1 load and 1 store per cycle, regardless of width. L1 cache has a 128 or 256bit path to L2 cache.

If loads are your bottleneck, one wide load which you split up with shifts or masks after loading can help. Or use SIMD to process data in parallel without unpacking after loading in parallel.

0 讨论(0)

查看其它8个回答
发布评论:

提交评论
- 加载中...