“Custom intrinsic” function for x64 instead of inline assembly possible?

痴心易碎 提交于 2019-12-05 04:58:11

Even VC 2005 is capable of producing code with sbb instruction.

for C code

bool __declspec(noinline) IsPowOf2(unsigned int a)
{
    return (a>=1)&((a&(a-1))<1);
}

compiles to the following

00401000  lea         eax,[ecx-1] 
00401003  and         eax,ecx 
00401005  cmp         eax,1 
00401008  sbb         eax,eax 
0040100A  neg         eax  
0040100C  cmp         ecx,1 
0040100F  sbb         ecx,ecx 
00401011  add         ecx,1 
00401014  and         eax,ecx 
00401016  ret          

No, you cannot implement any custom intrinsics, they are all built into the compiler. It is not only the instructions that are built in, but the compiler also knows the semantics of the intrinsic, and adapts the code for different surrounding code.

One reason for inline assembly being removed for x86-64 is that inserting assembly into the middle of a function disturbs the optimizer, and often results in less well optimized code around the assembler code. There can easily be a net loss there!

The only real use for intrinsics are for "interesting" special instructions that the compiler cannot generate from C or C++ constructs, like BSF or BSR. Most everything else will work better using inline functions, like your template above.

If you need to do something special, that the compiler does not understand, the only real option is to write the entire function as a separate assembler module. If the call overhead for that function is too expensive, the optimization probably wasn't worth that much in the first place.

Trust your compiler(tm)!

VC10 x64 intrinsics would not be of great help in this simple case. The dynamic branching you have is due to the && operator which is an early out operator. In many cases(your case is a perfect example) it's better to avoid branching by computing the result for all branches then apply a mask to select the good one. A cpp code with masking would look like :

template<typename T_Type>
inline bool isPowerOfTwo(T_Type const& x)
{
    // static type checking for the example
    static_assert( std::is_integral<T_Type>::value && std::is_unsigned<T_Type>::value, "limited to unsigned types for the example" );
    typedef std::make_signed<T_Type>::type s_Type;

    // same as yours but with no branching
    return bool(  ((s_Type( s_Type(x != 0) << (s_Type(sizeof(T_Type)<<3u)-1) )) >> (s_Type(s_Type(sizeof(T_Type)<<3u)-1)))  & ((x & (x - 1)) == 0)  );
}

In the above code I'm not checking if the number is negative or not for signed types. Again a simple mask will do the trick by performing an arithmetic shift to right (numBit-1) times to get a value of (~0) for negative numbers and 0 for positive ones

The only way forward is to step back a bit and start looking at the larger picture. Either stop implementing micro-optimised API or progress onto making larger API calls all optimised in MASM64, YASM, NASM, etc.

If you use one of the more powerful assemblers you can turn the small functions into macros, so basically change your C/C++ header based inline assembler function into an assembler include file.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!