Optimization barrier for microbenchmarks in MSVC: tell the optimizer you clobber memory?

后端 未结 2 1108
旧时难觅i
旧时难觅i 2021-01-03 21:10

Chandler Carruth introduced two functions in his CppCon2015 talk that can be used to do some fine-grained inhibition of the optimizer. They are useful to write micro-benchma

相关标签:
2条回答
  • 2021-01-03 21:31

    Given your approximation of escape(), you should also be fine with the following approximation of clobber() (note that this is a draft idea, deferring some of the solution to the implementation of the function nextLocationToClobber()):

    // always returns false, but in an undeducible way
    bool isClobberingEnabled();
    
    // The challenge is to implement this function in a way,
    // that will make even the smartest optimizer believe that
    // it can deliver a valid pointer pointing anywhere in the heap,
    // stack or the static memory.
    volatile char* nextLocationToClobber();
    
    const bool clobberingIsEnabled = isClobberingEnabled();
    volatile char* clobberingPtr;
    
    inline void clobber() {
        if ( clobberingIsEnabled ) {
            // This will never be executed, but the compiler
            // cannot know about it.
            clobberingPtr = nextLocationToClobber();
            *clobberingPtr = *clobberingPtr;
        }
    }
    

    UPDATE

    Question: How would you ensure that isClobberingEnabled returns false "in an undeducible way"? Certainly it would be trivial to place the definition in another translation unit, but the minute you enable LTCG, that strategy is defeated. What did you have in mind?

    Answer: We can take advantage of a hard-to-prove property from the number theory, for example, Fermat's Last Theorem:

    bool undeducible_false() {
        // It took mathematicians more than 3 centuries to prove Fermat's
        // last theorem in its most general form. Hardly that knowledge
        // has been put into compilers (or the compiler will try hard
        // enough to check all one million possible combinations below).
    
        // Caveat: avoid integer overflow (Fermat's theorem
        //         doesn't hold for modulo arithmetic)
        std::uint32_t a = std::clock() % 100 + 1;
        std::uint32_t b = std::rand() % 100 + 1;
        std::uint32_t c = reinterpret_cast<std::uintptr_t>(&a) % 100 + 1;
    
        return a*a*a + b*b*b == c*c*c;
    }
    
    0 讨论(0)
  • 2021-01-03 21:45

    I have used the following in place of escape.

    #ifdef _MSC_VER
    #pragma optimize("", off)
    template <typename T>
    inline void escape(T* p) {
        *reinterpret_cast<char volatile*>(p) =
            *reinterpret_cast<char const volatile*>(p); // thanks, @milleniumbug
    }
    #pragma optimize("", on)
    #endif
    

    It's not perfect but it's close enough, I think.

    Sadly, I don't have a way to emulate clobber.

    0 讨论(0)
提交回复
热议问题