Is there any way to complie a microsoft style inline-assembly code on a linux platform?

问题

As mentioned in title, i'm wondering that is there any way to compile a microsoft style inline-assembly code (as showed below) in a linux OS (e.g. ubuntu).

_asm{
    mov edi, A;
    ....
    EMMS;
}

The sample code is part of a inline-assembly code which can be compiled successfully on win10 with cl.exe compiler. Is there any way to compile it on linux? Do i have to rewrite it in GNU c/c++ style (i.e. __asm__{;;;})?

回答1:

First of all, you should usually replace inline asm (with intrinsics or pure C) instead of porting it. https://gcc.gnu.org/wiki/DontUseInlineAsm

clang -fasm-blocks is mostly compatible with MSVC's inefficient inline asm syntax. But it doesn't support returning a value by leaving it in EAX and then falling off the end of a non-void function.

So you have to write inline asm that puts the value in a named C variable and return that, typically leading to an extra store/reload making MSVC syntax even worse. (Pretty bad unless you're writing a whole loop in asm that amortizes that store/reload overhead of getting data into / out of the asm block). See What is the difference between 'asm', '__asm' and '__asm__'? for a comparison of how inefficient MSVC inline-asm is when wrapping a single instruction. It's less dumb inside functions with stack args when those functions don't inline, but that only happens if you're already making things inefficient (e.g. using legacy 32-bit calling conventions and not using link-time optimization to inline small functions).

MSVC can substitute A with an immediate 1 when inlining into a caller, but clang can't. Both defeat constant-propagation but MSVC at least avoids bouncing constant inputs through a store/reload. (As long as you only use it with instructions that can support an immediate source operand.)

Clang accepts __asm, asm, or __asm__ to introduce an asm-block. MSVC accepts __asm (2 underscores like clang) or _asm (more commonly used, but clang doesn't accept it).

So for existing MSVC code you probably want #define _asm __asm so your code can compile with both MSVC and clang, unless you need to make separate versions anyway. Or use clang -D_asm=asm to set a CPP macro on the command line.

Example: compile with MSVC or with `clang -fasm-blocks`

(Don't forget to enable optimization: clang -fasm-blocks -O3 -march=native -flto -Wall. Omit or modify -march=native if you want a binary that can run on earlier/other CPUs than your compile host.)

int a_global;

inline
long foo(int A, int B, int *arr) {
    int out;
    // You can't assume A will be in RDI: after inlining it prob. won't be
    __asm {
        mov   ecx, A                   // comment syntax
        add   dword ptr [a_global], 1
        mov   out, ecx
    }
    return out;
}

Compiling with x86-64 Linux clang 8.0 on Godbolt shows that clang can inline the wrapper function containing the inline-asm, and how much store/reload MSVC syntax entails (vs. GNU C inline asm which can take inputs and outputs in registers).

I'm using clang in Intel-syntax asm output mode, but it also compiles Intel-syntax asm blocks when it's outputting in AT&T syntax mode. (Normally clang compiles straight to machine-code anyway, which it also does correctly.)

## The x86-64 System V ABI passes args in rdi, rsi, rdx, ...
# clang -O3 -fasm-blocks -Wall
foo(int, int, int*):
        mov     dword ptr [rsp - 4], edi        # compiler-generated store of register arg to the stack

        mov     ecx, dword ptr [rsp - 4]        # start of inline asm
        add     dword ptr [rip + a_global], 1
        mov     dword ptr [rsp - 8], ecx        # end of inline asm

        movsxd  rax, dword ptr [rsp - 8]        # reload `out` with sign-extension to long (64-bit) : compiler-generated
        ret

Notice how the compiler substituted [rsp - 4] and [rsp - 8] for the C local variables A and out in the asm source block. And that a variable in static storage gets RIP-relative addressing. GNU C inline asm doesn't do this, you need to declare %[name] operands and tell the compiler where to put them.

We can even see clang inline that function twice into one caller, and optimize away the sign-extension to 64-bit because this function only returns int.

int caller() {
    return foo(1, 2, nullptr) + foo(1, 2, nullptr);
}

caller():                             # @caller()
        mov     dword ptr [rsp - 4], 1

        mov     ecx, dword ptr [rsp - 4]      # first inline asm
        add     dword ptr [rip + a_global], 1
        mov     dword ptr [rsp - 8], ecx

        mov     eax, dword ptr [rsp - 8]     # compiler-generated reload
        mov     dword ptr [rsp - 4], 1       # and store of A=1 again

        mov     ecx, dword ptr [rsp - 4]      # second inline asm
        add     dword ptr [rip + a_global], 1
        mov     dword ptr [rsp - 8], ecx

        add     eax, dword ptr [rsp - 8]     # compiler-generated reload
        ret

So we can see that just reading A from inline asm creates a missed-optimization: the compiler stores a 1 again even though the asm only read that input without modifying it.

I haven't done tests like assigning to or reading a_global before/between/after the asm statements to make sure the compiler "knows" that variable is modified by the asm statement.

I also haven't tested passing a pointer into an asm block and looping over the pointed-to array, to see if it's like a "memory" clobber in GNU C inline asm. I'd assume it is.

My Godbolt link also includes an example of falling off the end of a non-void function with a value in EAX. That's supported by MSVC, but is UB like usual for clang and breaks when inlining into a caller. (Strangely with no warning, even at -Wall). You can see how x86 MSVC compiles it on my Godbolt link above.

https://gcc.gnu.org/wiki/DontUseInlineAsm

Porting MSVC asm to GNU C inline asm is almost certainly the wrong choice. Compiler support for optimizing intrinsics is very good, so you can usually get the compiler to generate good-quality efficient asm for you.

If you're going to do anything to existing hand-written asm, usually replacing them with pure C will be most efficient, and certainly the most future-proof, path forward. Code that can auto-vectorize to wider vectors in the future is always good. But if you do need to manually vectorize for some tricky shuffling, then intriniscs are the way to go unless the compiler makes a mess of it somehow.

Look at the compiler-generated asm you get from intrinsics to make sure it's as good or better than the original.

If you're using MMX EMMS, now is probably a good time to replace your MMX code with SSE2 intrinsics. SSE2 is baseline for x86-64, and few Linux systems are running obsolete 32-bit kernels.

回答2:

Is there any way to complie a microsoft style inline-assembly code on a linux platform?

Yes, it is possible. Kind of.

For GCC you have to use both Intel and AT&T syntax. It does not work with Clang due to Issue 24232, Inline assembly operands don't work with .intel_syntax and Issue 39895, Error: unknown token in expression using inline asm.

Here is the pattern. The assembler template uses .intel_syntax. Then, at the end of your asm template, you switch back to .att_syntax mode so it's in the right mode for the rest of the compiler-generated asm.

#include <cstddef>
int main(int argc, char* argv[])
{
    size_t ret = 1, N = 0;
    asm __volatile__
    (
        ".intel_syntax   noprefix ;\n"
        "xor esi, esi    ;\n"           // zero RSI
        "neg %1          ;\n"           // %1 is replaced with the operand location chosen by the compiler, in this case RCX
        "inc %1          ;\n"
        "push %1         ;\n"           // UNSAFE: steps on the red-zone
        "pop rax         ;\n"
        ".att_syntax     prefix ;\n"
        : "=a" (ret)      // output-only operand in RAX
          "+c" (N)        // read-write operand in RCX
        :                 // no read-only inputs
        : "%rsi"          // RSI is clobbered: input and output register constraints can't pick it
    );
    return (int)ret;
}

This won't work if you use any memory operands, because the compiler will substitute AT&T syntax 4(%rsp) into the template instead of [rsp + 4], for example.

This also only works if you don't compile with gcc -masm=intel. Otherwise you'll put the assembler into AT&T mode when GCC is emitting Intel syntax. So using .intel_syntax noprefix breaks your ability to use either syntax with GCC.

mov edi, A;

The code I help with does not use variables in the assembler like you show. I don't know how well (poorly?) it works with Intel style ASM. I know a MASM style-grammar is not supported.

You may be able to do it using asmSymbolicNames. See the GCC Extended ASM HowTo for details.

However, to convert to something GCC can consume, you only need to use positional arguments:

__asm__ __volatile__
(
    ".intel_syntax   noprefix ;\n"
    "mov edi, %0     \n";            // inefficient: use a "D" constraint instead of a mov
    ...
    ".att_syntax     prefix ;\n"
    : : "r" (A) : "%edi"
);

Or better, use a "D" constraint to ask for the variable in EDI / RDI in the first place. If a GNU C inline asm statement ever starts or ends with a mov, that's usually a sign you're doing it wrong.

Regarding asmSymbolicNames, here is what the GCC Extended ASM HowTo has to say about them:

This code makes no use of the optional asmSymbolicName. Therefore it references the first output operand as %0 (were there a second, it would be %1, etc). The number of the first input operand is one greater than that of the last output operand. In this i386 example, that makes Mask referenced as %1:
uint32_t Mask = 1234;
uint32_t Index;

  asm ("bsfl %1, %0"
     : "=r" (Index)
     : "r" (Mask)
     : "cc");
That code overwrites the variable Index (‘=’), placing the value in a register (‘r’). Using the generic ‘r’ constraint instead of a constraint for a specific register allows the compiler to pick the register to use, which can result in more efficient code. This may not be possible if an assembler instruction requires a specific register.

The following i386 example uses the asmSymbolicName syntax. It produces the same result as the code above, but some may consider it more readable or more maintainable since reordering index numbers is not necessary when adding or removing operands. The names aIndex and aMask are only used in this example to emphasize which names get used where. It is acceptable to reuse the names Index and Mask.
uint32_t Mask = 1234;
uint32_t Index;

  asm ("bsfl %[aMask], %[aIndex]"
     : [aIndex] "=r" (Index)
     : [aMask] "r" (Mask)
     : "cc");

The sample code is part of a inline-assembly code which can be compiled successfully on win10 with cl.exe compiler...

Stepping back to 10,000 feet, if you are looking for something easy to use to integrate inline ASM like in Microsoft environments, then you don't have it on Linux. GCC inline ASM absolutely sucks. The GCC inline assembler is an archaic, difficult to use tool that I despise interacting with.

(And you have not experienced the incomprehensible error messages with bogus line information, yet).

回答3:

Peter's idea solved my problem. I just added a macro into my source file in which all functions consist of single big inline-asm block of intel syntax. The macro is showed below:

#define _asm\
        asm(".intel_syntax noprefix\n");\
        asm\

After that i compiled it with command:

clang++ -c -fasm-blocks source.cpp

Then everthing is OK.

来源：https://stackoverflow.com/questions/57186687/is-there-any-way-to-complie-a-microsoft-style-inline-assembly-code-on-a-linux-pl

标签

Linux

gcc

visual-c++

inline-assembly

Is there any way to complie a microsoft style inline-assembly code on a linux platform?

问题

回答1:

Example: compile with MSVC or with clang -fasm-blocks

https://gcc.gnu.org/wiki/DontUseInlineAsm

回答2:

回答3:

Example: compile with MSVC or with `clang -fasm-blocks`