Optimizing member variable order in C++

前端未结

关注

 10  709

长发绾君心 2020-12-04 07:18

I was reading a blog post by a game coder for Introversion and he is busily trying to squeeze every CPU tick he can out of the code. One trick he mentions off-hand is to

10条回答

忘掉有多难 (楼主)

2020-12-04 07:37
We have slightly different guidelines for members here (ARM architecture target, mostly THUMB 16-bit codegen for various reasons):
- group by alignment requirements (or, for newbies, "group by size" usually does the trick)
- smallest first
"group by alignment" is somewhat obvious, and outside the scope of this question; it avoids padding, uses less memory, etc.

The second bullet, though, derives from the small 5-bit "immediate" field size on the THUMB LDRB (Load Register Byte), LDRH (Load Register Halfword), and LDR (Load Register) instructions.

5 bits means offsets of 0-31 can be encoded. Effectively, assuming "this" is handy in a register (which it usually is):
- 8-bit bytes can be loaded in one instruction if they exist at this+0 through this+31
- 16-bit halfwords if they exist at this+0 through this+62;
- 32-bit machine words if they exist at this+0 through this+124.
If they're outside this range, multiple instructions have to be generated: either a sequence of ADDs with immediates to accumulate the appropriate address in a register, or worse yet, a load from the literal pool at the end of the function.

If we do hit the literal pool, it hurts: the literal pool goes through the d-cache, not the i-cache; this means at least a cacheline worth of loads from main memory for the first literal pool access, and then a host of potential eviction and invalidation issues between the d-cache and i-cache if the literal pool doesn't start on its own cache line (i.e. if the actual code doesn't end at the end of a cache line).

(If I had a few wishes for the compiler we're working with, a way to force literal pools to start on cacheline boundaries would be one of them.)

(Unrelatedly, one of the things we do to avoid literal pool usage is keep all of our "globals" in a single table. This means one literal pool lookup for the "GlobalTable", rather than multiple lookups for each global. If you're really clever you might be able to keep your GlobalTable in some sort of memory that can be accessed without loading a literal pool entry -- was it .sbss?)
0 讨论(0)

查看其它10个回答
发布评论:

提交评论
- 加载中...