What's the advantage of having nonvolatile registers in a calling convention?

泪湿孤枕 提交于 2019-12-19 10:06:58

问题


I'm programming a JIT compiler and I've been surprised to discover that so many of the x86-64 registers are nonvolatile (callee-preserved) in the Win64 calling convention. It seems to me that nonvolatile registers just amount to more work in all functions that could use these registers. This seems especially true in the case of numeric computations where you'd want to use many registers in a leaf function, say some kind of highly optimized matrix multiplication. However, only 6 of the 16 SSE registers are volatile, for example, so you'd have a lot of spilling to do if you need to use more than that.

So yeah, I don't get it. What's the tradeoff here?


回答1:


If registers are caller-saves, then the caller always has to save or reload those registers around a function call. But if registers are callee-saves, then the callee only has to save the registers that it uses, and only when it knows they're going to be used (i.e. maybe not at all in an early-exit scenario). The disadvantage of this convention is that the callee doesn't have knowledge of the caller, so it might be saving registers that are dead anyway, but I guess that's seen as a smaller concern.




回答2:


The advantage of having nonvolatile registers is: performance.

The less data is moved, the more efficient a CPU is.

The more volatile registers, the more energy does the CPU need.




回答3:


The Windows x86-64 calling convention with only 6 call-clobbered xmm registers is not a very good design, you're right. Most SIMD (and many scalar FP) loops don't contain any function calls, so they gain nothing from having their data in call-preserved registers. The save/restore is pure downside because it's rare than any of their callers are making use of this non-volatile state.

In x86-64 System V, all the vector registers are call-clobbered, which is maybe too far the other way. Having 1 or 2 call-preserved would be nice in many cases, especially for code that makes some math library function calls. (Use gcc -fno-math-errno to let simple ones inline better; sometimes the only reason they don't is that they need to set errno on NaN.)

Related: how the x86-64 SysV calling convention was chosen: looking at code size and instruction count for gcc compiling SPECint/SPECfp.


For integer regs, having some of each is definitely good, and all "normal" calling conventions (for all architectures, not just x86) do in fact have a mix. This reduces the total amount of work done spilling/restoring in callers and callees combined.

Forcing the caller to spill/reload everything around every function call is not good for code-size or performance. Saving / restoring some call-preserved regs at the start/end of the function lets non-leaf functions keep some things live in registers across calls.

Consider some code that calculates a couple things and then does cout << "result: " << a << "foo" << b*c << '\n'; That's 4 function calls to std::ostream operator<<, and they generally don't inline. Keeping the address of cout and the locals you just computed in non-volatile registers means you only need some cheap mov reg,reg instructions to set up the args for the next call. (Or push in a stack-args calling convention).

But having some call-clobbered registers that can be used without saving is also very important. Functions that don't need all the architectural registers can just use the call-clobbered registers as temporaries. This avoids introducing a spill/reload into the critical path for the caller's dependency chains (for very small callees), as well as saving instructions.

Sometimes a complex function will save/restore some call-preserved registers just to get more total registers (like you're seeing with XMM for number crunching). This is generally worth it; saving/restoring the caller's non-volatile registers is usually better than spilling/reloading your own local variables to the stack, especially not if you would have to do that inside any loop.


Another reason for call-clobbered registers is that usually some of your values are "dead" after a function call: you only needed them as args to the function. Computing them in call-clobbered registers means you don't have to save/restore anything to free up those registers, but also that your callee can also freely use them. This is even better in calling conventions that pass args in registers: you can compute your inputs directly in the arg-passing registers. (And copy any to call-preserved regs or spill them to stack memory if you also need them after the function.)

(I like the terms call-preserved vs. call-clobbered, rather than caller-saved vs. callee-saved. The latter terms imply that someone must save the registers, instead of just letting dead values die. volatile / non-volatile is not bad, but those terms also have other technical meanings as C keywords, or in terms of flash vs. DRAM.)



来源:https://stackoverflow.com/questions/10392895/whats-the-advantage-of-having-nonvolatile-registers-in-a-calling-convention

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!