Why INC and ADD 1 have different performances? [duplicate]

非 Y 不嫁゛ 提交于 2019-12-03 11:29:11

For the x86 architecture, INC updates on a subset of the condition codes, whereas ADD updates the entire set of condition codes. (Other architectures have different rules so this discussion may or may not apply).

So an INC instruction must wait for other previous instructions that update the condition code bits to finish, before it can modify that previous value to produce its final condition code result.

ADD can produce final condition code bits without regard to previous values of the condition codes, so it doesn't need to wait for previous instructions to finish computing their value of the condition codes.

Consequence: you can execute ADD in parallel with lots of other instructions, and INC with fewer other instructions. Thus, ADD appears to be faster in practice.

(I believe there is a similar issue with working with 8 bit registers (e.g., AL) in the context of full width registers (e.g., EAX), in that an AL update requires that previous EAX updates complete first).

I don't use INC or DEC in my high performance assembly code anymore. If you aren't ultrasensitive to execution times, then INC or DEC is just fine and can reduce the size of your instruction stream.

The XOR ax, ax bit is, I gather a few years out of date, and assigning zero now beats it (so I'm told).

The C bit about counter++ rather than counter+=1 is a couple of decades out of date. Definitely.

The simple reason for the first one with assembly, is that all instructions will be translated into some sort of operation on the part of the CPU, and while the designers will always try to make everything as fast as possible, they'll do a better job with some than with others. It's not hard to imagine how an INC could be faster since it only has to deal with one register, though that's grossly over-simplifying (but I don't know much about these things, so over-simplify is all I can do on that part).

The C one though, is long ago nonsense. If we have a particular CPU where INC beats ADD, why on earth would the compiler designer not use INC instead of ADD, for both counter++ and counter+=1? Compilers do a lot of optimisations, and that sort of change is far from the most complicated.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!