Code alignment in one object file is affecting the performance of a function in another object file

前端 未结 2 944
余生分开走
余生分开走 2020-12-07 03:43

I\'m familiar with data alignment and performance but I\'m rather new to aligning code. I started programming in x86-64 assembly recently with NASM and have been comparing

2条回答
  •  长情又很酷
    2020-12-07 04:12

    Ahhh, code alignment...

    Some basics of code alignment..

    • Most intel architectures fetch 16B worth of instructions per clock.
    • The branch predictor has a larger window and looks at typically double that, per clock. The idea is to get ahead of the instructions fetched.
    • How your code is aligned will dictate which instructions you have available to decode and predict at any given clock (simple code locality argument).
    • Most modern intel architectures cache instructions at various levels (either at the macro instructions level before decoding, or at the micro instruction level after decoding). This eliminates the effects of code alignment, as long as you executing out of the micro/macro cache.
    • Also, most modern intel architectures have some form of loop stream detector that detects loops, again, executing them out of some cache that bypasses the front end fetch mechanism.
    • Some intel architectures are finicky about what they can cache, and what they can't. There are often dependencies on number of instructions/uops/alignment/branches/etc. Alignment may, in some cases, affect what's cached and what's not, and you can create cases where padding can prevent or cause a loop to get cached.
    • To make things even more complicated, the addresses of instructions are also use by the branch predictor. They are used in several ways, including (1) as a lookup into a branch prediction buffer to predict branches, (2) as a key/value to maintain some form of global state of branch behavior for prediction purposes, (3) as a key into determining indirect branch targets, etc.. Therefore, alignment can actually have a pretty huge impact on branch prediction, in some case, due to aliasing or other poor prediction.
    • Some architectures use instruction addresses to determine when to prefetch data, and code alignment can interfere with that, if just the right conditions exist.
    • Aligning loops is not always a good thing to do, depending on how the code is laid out (especially if there's control flow in the loop).

    Having said all that blah blah, your issue could be one of any of these. It's important to look at the disassembly of not just the object, but the executable. You want to see what the final addresses are after everything is linked. Making changes in one object, could affect the alignment/addresses of instructions in another object after linking.

    In some cases, it's near impossible to align your code in such a way as to maximize performance, simply due to so many low level architectural behaviors being hard to control and predict (that doesn't necessarily mean this is always the case). In some cases, your best bet is to have some default alignment strategy (say align all entries on 16B boundaries, and outer loops the same) so as you minimize the amount your performance varies from change-to-change. As a general strategy, aligning function entries is good. Aligning loops that are relatively small is good, as long as you're not adding nops in your execution path.

    Beyond that, I'd need more info/data to pinpoint your exact problem, but thought some of this may help.. Good luck :)

提交回复
热议问题