What's the proper way to use different versions of SSE intrinsics in GCC?

后端 未结 4 1384
旧巷少年郎
旧巷少年郎 2020-12-29 09:12

I will ask my question by giving an example. Now I have a function called do_something().

It has three versions: do_something(), do_s

相关标签:
4条回答
  • 2020-12-29 09:32

    I think that the Mystical's tip is fine, but if you really want to do it in the one file, you can use proper pragmas, for instance:

    #pragma GCC target("sse4.1")
    

    GCC 4.4 is needed, AFAIR.

    0 讨论(0)
  • 2020-12-29 09:33

    If you are using GCC 4.9 or above on an i686 or x86_64 machine, then you are supposed to be able to use intrinsics regardless of your -march=XXX and -mXXX options. You could write your do_something() accordingly:

    void do_something()
    {
        byte temp[18];
    
        if (HasSSE2())
        {
            const __m128i i = _mm_loadu_si128((const __m128i*)(ptr));
            ...
        }
        else if (HasSSSE3())
        {
            const __m128i MASK = _mm_set_epi8(12,13,14,15, 8,9,10,11, 4,5,6,7, 0,1,2,3);
            _mm_storeu_si128(reinterpret_cast<__m128i*>(temp),
               _mm_shuffle_epi8(_mm_loadu_si128((const __m128i*)(ptr)), MASK));
        }
        else
        {
            // Do the byte swap/endian reversal manually
            ...
        }
    }
    

    You have to supply HasSSE2(), HasSSSE3() and friends. Also see Intrinsics for CPUID like informations?.

    Also see GCC Issue 57202 - Please make the intrinsics headers like immintrin.h be usable without compiler flags. But I don't believe the feature works. I regularly encounter compile failures because GCC does not make intrinsics available.

    0 讨论(0)
  • 2020-12-29 09:49

    Here is an example of compiling a separate object file for each optimization setting: http://notabs.org/lfsr/software/index.htm

    But even this method fails when gcc link time optimization (-flto) is used. So how can a single executable be built with full optimization for different processors? The only solution I can find is to use include directives to make the C files behave as a single compilation unit so that -flto is not needed. Here is an example using that method: http://notabs.org/blcutil/index.htm

    0 讨论(0)
  • I think you want to build what's called a "CPU dispatcher". I got one working (as far as I know) for GCC but have not got it to work with Visual Studio.
    cpu dispatcher for visual studio for AVX and SSE

    I would check out Agner Fog's vectorclass and the file dispatch_example.cpp http://www.agner.org/optimize/#vectorclass

    g++ -O3 -msse2   -c dispatch_example.cpp -od2.o
    g++ -O3 -msse4.1 -c dispatch_example.cpp -od5.o
    g++ -O3 -mavx    -c dispatch_example.cpp -od8.o
    g++ -O3 -msse2      instrset_detect.cpp d2.o d5.o d8.o
    
    0 讨论(0)
提交回复
热议问题