Intriguing assembly for comparing std::optional of primitive types

问题

Valgrind picked up a flurry Conditional jump or move depends on uninitialised value(s) in one of my unit tests.

Inspecting the assembly, I realized that the following code:

bool operator==(MyType const& left, MyType const& right) {
    // ... some code ...
    if (left.getA() != right.getA()) { return false; }
    // ... some code ...
    return true;
}

Where MyType::getA() const -> std::optional<std::uint8_t>, generated the following assembly:

   0x00000000004d9588 <+108>:   xor    eax,eax
   0x00000000004d958a <+110>:   cmp    BYTE PTR [r14+0x1d],0x0
   0x00000000004d958f <+115>:   je     0x4d9597 <... function... +123>
x  0x00000000004d9591 <+117>:   mov    r15b,BYTE PTR [r14+0x1c]
x  0x00000000004d9595 <+121>:   mov    al,0x1

   0x00000000004d9597 <+123>:   xor    edx,edx
   0x00000000004d9599 <+125>:   cmp    BYTE PTR [r13+0x1d],0x0
   0x00000000004d959e <+130>:   je     0x4d95ae <... function... +146>
x  0x00000000004d95a0 <+132>:   mov    dil,BYTE PTR [r13+0x1c]
x  0x00000000004d95a4 <+136>:   mov    dl,0x1
x  0x00000000004d95a6 <+138>:   mov    BYTE PTR [rsp+0x97],dil

   0x00000000004d95ae <+146>:   cmp    al,dl
   0x00000000004d95b0 <+148>:   jne    0x4da547 <... function... +4139>

   0x00000000004d95b6 <+154>:   cmp    r15b,BYTE PTR [rsp+0x97]
   0x00000000004d95be <+162>:   je     0x4d95c8 <... function... +172>

    => Jump on uninitialized

   0x00000000004d95c0 <+164>:   test   al,al
   0x00000000004d95c2 <+166>:   jne    0x4da547 <... function... +4139>

Where I marked with x the statements that are not executed (jumped over) in the case where the optional is NOT set.

The member A here is at offset 0x1c into MyType. Checking the layout of std::optional we see that:

+0x1d corresponds to bool _M_engaged,
+0x1c corresponds to std::uint8_t _M_payload (inside an anonymous union).

The code of interest for std::optional is:

constexpr explicit operator bool() const noexcept
{ return this->_M_is_engaged(); }

// Comparisons between optional values.
template<typename _Tp, typename _Up>
constexpr auto operator==(const optional<_Tp>& __lhs, const optional<_Up>& __rhs) -> __optional_relop_t<decltype(declval<_Tp>() == declval<_Up>())>
{
    return static_cast<bool>(__lhs) == static_cast<bool>(__rhs)
         && (!__lhs || *__lhs == *__rhs);
}

Here, we can see that gcc transformed the code quite substantially; if I understand it correctly, in C this gives:

char rsp[0x148]; // simulate the stack

/* comparisons of prior data members */

/*
0x00000000004d9588 <+108>:   xor    eax,eax
0x00000000004d958a <+110>:   cmp    BYTE PTR [r14+0x1d],0x0
0x00000000004d958f <+115>:   je     0x4d9597 <... function... +123>
0x00000000004d9591 <+117>:   mov    r15b,BYTE PTR [r14+0x1c]
0x00000000004d9595 <+121>:   mov    al,0x1
*/

int eax = 0;
if (__lhs._M_engaged == 0) { goto b123; }
bool r15b = __lhs._M_payload;
eax = 1;

b123:
/*
0x00000000004d9597 <+123>:   xor    edx,edx
0x00000000004d9599 <+125>:   cmp    BYTE PTR [r13+0x1d],0x0
0x00000000004d959e <+130>:   je     0x4d95ae <... function... +146>
0x00000000004d95a0 <+132>:   mov    dil,BYTE PTR [r13+0x1c]
0x00000000004d95a4 <+136>:   mov    dl,0x1
0x00000000004d95a6 <+138>:   mov    BYTE PTR [rsp+0x97],dil
*/

int edx = 0;
if (__rhs._M_engaged == 0) { goto b146; }
rdi = __rhs._M_payload;
edx = 1;
rsp[0x97] = rdi;

b146:
/*
0x00000000004d95ae <+146>:   cmp    al,dl
0x00000000004d95b0 <+148>:   jne    0x4da547 <... function... +4139>
*/

if (eax != edx) { goto end; } // return false

/*
0x00000000004d95b6 <+154>:   cmp    r15b,BYTE PTR [rsp+0x97]
0x00000000004d95be <+162>:   je     0x4d95c8 <... function... +172>
*/

//  Flagged by valgrind
if (r15b == rsp[097]) { goto b172; } // next data member

/*
0x00000000004d95c0 <+164>:   test   al,al
0x00000000004d95c2 <+166>:   jne    0x4da547 <... function... +4139>
*/

if (eax == 1) { goto end; } // return false

b172:

/* comparison of following data members */

end:
    return false;

Which is equivalent to:

//  Note how the operands of || are inversed.
return static_cast<bool>(__lhs) == static_cast<bool>(__rhs)
         && (*__lhs == *__rhs || !__lhs);

I think that the assembly is correct, if strange. That is, as far as I can see, the result of the comparison between uninitialized values does not actually influence the result of the function (and unlike C or C++, I do expect comparing junk in x86 assembly NOT to be UB):

If one optional is nullopt and the other is set, then the conditional jump at +148 jumps to end (return false), OK.
If both optionals are set, then the comparison reads initialized values, OK.

So the only case of interest is when both optionals are nullopt:

if the values compare equal, then the code concludes that the optionals are equal, which is true since they are both nullopt,
otherwise, the code concludes that the optionals are equal if __lhs._M_engaged is false, which is true.

In either case, the code therefore concludes that both optionals are equal when both are nullopt; CQFD.

This is the first instance I see of gcc generating apparently "benign" uninitialized reads, and therefore I have a few questions:

Are uninitialized reads OK in assembly (x84_64)?
Is this the syndrome of a failed optimization (reversing ||) which could trigger in non-benign circumstances?

For now, I am leaning toward annotating the few functions with optimize(1) as a work-around to prevent optimizations from kicking in. Fortunately, the identified functions are not performance critical.

Environment:

compiler: gcc 7.3
compile flags: -std=c++17 -g -Wall -Werror -O3 -flto (+ appropriate includes)
link flags: -O3 -flto (+ appropriate libraries)

Note: can appear with -O2 instead of -O3, but never without -flto.

Fun facts

In the full code, this pattern appears 32 times in the function outlined above, for various payloads : std::uint8_t, std::uint32_t, std::uint64_t and even a struct { std::int64_t; std::int8_t; }.

It only appears in a few big operator== comparing types with ~40 data members, not in smaller ones. And it does not appear for the std::optional<std::string_view> even in those specific functions (which call into std::char_traits for the comparison).

Finally, infuriatingly, isolating the function in question in its own binary makes the "problem" disappear. The mythical MCVE is proving elusive.

回答1:

In x86 asm, the worst that happens is that a single register has an unknown value (or you don't know which of two possible values it has, old or new, in case of possible memory-ordering). But if your code doesn't depend on that register value, you're fine, unlike in C++. C++ UB means your whole program is in theory completely hosed after one signed-integer overflow, and even before that along code-paths that the compiler can see will lead to UB. Nothing like that ever happens in asm, at least not in unprivileged user-space code.

(There might be a few things you can do to basically cause system-wide unpredictable behaviour in the kernel, by setting control registers in weird ways or putting inconsistent things into page tables or descriptors, but that's not going to happen from something like this, even if you were compiling kernel code.)

Some ISAs have "unpredictable behaviour", like early ARM if you use the same register for multiple operands of a multiply, the behaviour is unpredictable. IDK if this allows breaking the pipeline and corrupting other registers, or if it's restricted to just an unexpected multiply result. The latter would be my guess.

Or MIPS, if you put a branch in the branch-delay slot, the behaviour is unpredictable. (Handling exceptions is messy because of branch-delay slots...). But presumably there are still limits and you can't crash the machine or break other processes (in a multi-user system like Unix, it would be bad if an unprivileged user-space process could break anything for other users).

Very early MIPS also had load-delay slots, and multiply delay slots: you couldn't use the result of a load in the next instruction. Presumably you might get the old value of the register if you read it too early, or maybe just garbage. MIPS = Minimally Interlocked Pipeline Stages; they wanted to offload the stalling to software, but it turned out that adding a NOP when the compiler couldn't find anything useful to do next bloated binaries and led to slower overall code vs. having the hardware stall when necessary. But we're stuck with branch-delay slots because removing them would change the ISA, unlike relaxing a restriction on something early software didn't do.

回答2:

There are no trap values in x86 integer formats, so reading and comparing uninitialized values generates unpredictable truth/false values and no other direct harm.

In a cryptographic context, the state of the uninitialized values causing a different branch to be taken could leak into timing information leaking or other side-channel attacks. But cryptographic hardening is probably not what you are worried about.

The fact that gcc does uninitialized reads when it doesn't matter if the read gives the wrong value doesn't mean it will do it when it matters.

回答3:

I would not be so sure that it is caused by compiler error. Possibly there is some UB in Your code which allows compiler to optimize more aggressivelly Your code. Anyway, to the questions:

UB is not an issue in assembly. In most general case, what is left under the address You are refering to will be read. Of course most OSes fill memory pages before giving them to program, but your variable most probably resides on stack, so most probably it contains garbage data. Soo, as long You are ok with random data comparison (which is quite bad, as may spuriously give different results) assembly is valid
Most probably it is syndrome of reversed comparison

来源：https://stackoverflow.com/questions/51616179/intriguing-assembly-for-comparing-stdoptional-of-primitive-types

标签

c++

gcc

assembly

x86-64

c++17