How to access C struct/variables from inline asm?

寵の児 提交于 2019-11-28 14:16:44
Peter Cordes

If you only need 32b / 32b => 32bit division, let the compiler use both outputs of div, which gcc, clang and icc all do just fine, as you can see on the Godbolt compiler explorer:

uint32_t q = bn1->dat[i] / bn2->dat[j];
uint32_t m = bn1->dat[i] % bn2->dat[j];

Compilers are quite good at CSEing that into one div. Just make sure you don't store the division result somewhere that gcc can't prove won't affect the input of the remainder.

e.g. *m = dat[i] / dat[j] might overlap (alias) dat[i] or dat[j], so gcc would have to reload the operands and redo the div for the % operation. See the godbolt link for bad/good examples.


Using inline asm for 32bit / 32bit = 32bit div doesn't gain you anything, and actually makes worse code with clang (see the godbolt link).

If you need 64bit / 32bit = 32bit, you probably need asm, though, if there isn't a compiler built-in for it. (GNU C doesn't have one, AFAICT). The obvious way in C (casting operands to uint64_t) generates a call to a 64bit/64bit = 64bit libgcc function, which has branches and multiple div instructions. gcc isn't good at proving the result will fit in 32bits, so a single div instruction don't cause a #DE.

For a lot of other instructions, you can avoid writing inline asm a lot of the time with builtin functions for things like popcount. With -mpopcnt, it compiles to the popcnt instruction (and accounts for the false-dependency on the output operand that Intel CPUs have.) Without, it compiles to a libgcc function call.

Always prefer builtins, or pure C that compiles to good asm, so the compiler knows what the code does. When inlining makes some of the arguments known at compile-time, pure C can be optimized away or simplified, but code using inline asm will just load constants into registers and do a div at run-time. Inline asm also defeats CSE between similar computations on the same data, and of course can't auto-vectorize.


Using GNU C syntax the right way

https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html explains how to tell the assembler which variables you want in registers, and what the outputs are.

You can use Intel/MASM-like syntax and mnemonics, and non-% register names if you like, preferably by compiling with -masm=intel. The AT&T syntax bug (fsub and fsubr mnemonics are reversed) might still be present in intel-syntax mode; I forget.

Most software projects that use GNU C inline asm use AT&T syntax only.

See also the bottom of this answer for more GNU C inline asm info, and the tag wiki.


An asm statement takes one string arg, and 3 sets of constraints. The easiest way to make it multi-line is by making each asm line a separate string ending with \n, and let the compiler implicitly concatenate them.

Also, you tell the compiler which registers you want stuff in. Then if variables are already in registers, the compiler doesn't have to spill them and have you load and store them. Doing that would really shoot yourself in the foot. The tutorial Brett Hale linked in comments hopefully covers all this.


Correct example of div with GNU C inline asm

You can see the compiler asm output for this on godbolt.

uint32_t q, m;  // this is unsigned int on every compiler that supports x86 inline asm with this syntax, but not when writing portable code.

asm ("divl %[bn2dat_j]\n"
      : "=a" (q), "=d" (m) // results are in eax, edx registers
      : "d" (0),           // zero edx for us, please
        "a" (bn1->dat[i]), // "a" means EAX / RAX
        [bn2dat_j] "mr" (bn2->dat[j]) // register or memory, compiler chooses which is more efficient
      : // no register clobbers, and we don't read/write "memory" other than operands
    );

"divl %4" would have worked too, but named inputs/outputs don't change name when you add more input/output constraints.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!