Explanation of the UB while changing data

|▌冷眼眸甩不掉的悲伤 提交于 2019-12-25 04:08:27

问题


I was trying to demonstrate to a work pal that you can change the value of a constant-qualified variable if really wants to (and knows how to) by using some trickery, during my demostration, I've discovered that exists two "flavours" of constant values: the ones that you cannot change whatever you do, and the ones that you can change by using dirty tricks.

A constant value is unchangeable when the compiler uses the literal value instead of the value stored on the stack (readed here), here is a piece of code that shows what I mean:

// TEST 1
#define LOG(index, cv, ncv) std::cout \
    << std::dec << index << ".- Address = " \
    << std::hex << &cv << "\tValue = " << cv << '\n' \
    << std::dec << index << ".- Address = " \
    << std::hex << &ncv << "\tValue = " << ncv << '\n'

const unsigned int const_value = 0xcafe01e;

// Try with no-const reference
unsigned int &no_const_ref = const_cast<unsigned int &>(const_value);
no_const_ref = 0xfabada;
LOG(1, const_value, no_const_ref);

// Try with no-const pointer
unsigned int *no_const_ptr = const_cast<unsigned int *>(&const_value);
*no_const_ptr = 0xb0bada;
LOG(2, const_value, (*no_const_ptr));

// Try with c-style cast
no_const_ptr = (unsigned int *)&const_value;
*no_const_ptr = 0xdeda1;
LOG(3, const_value, (*no_const_ptr));

// Try with memcpy
unsigned int brute_force = 0xba51c;
std::memcpy(no_const_ptr, &brute_force, sizeof(const_value));
LOG(4, const_value, (*no_const_ptr));

// Try with union
union bad_idea
{
    const unsigned int *const_ptr;
    unsigned int *no_const_ptr;
} u;

u.const_ptr = &const_value;
*u.no_const_ptr = 0xbeb1da;
LOG(5, const_value, (*u.no_const_ptr));

This produces the following output:

1.- Address = 0xbfffbe2c    Value = cafe01e
1.- Address = 0xbfffbe2c    Value = fabada
2.- Address = 0xbfffbe2c    Value = cafe01e
2.- Address = 0xbfffbe2c    Value = b0bada
3.- Address = 0xbfffbe2c    Value = cafe01e
3.- Address = 0xbfffbe2c    Value = deda1
4.- Address = 0xbfffbe2c    Value = cafe01e
4.- Address = 0xbfffbe2c    Value = ba51c
5.- Address = 0xbfffbe2c    Value = cafe01e
5.- Address = 0xbfffbe2c    Value = beb1da

Since I'm relying in a UB (change the value of const data) is expected that the program acts weird; but this weirdness is more than I was expecting.

Let's supose that the compiler is using the literal value, then, when the code reach the instruction to change the value of the constant (by reference, pointer or memcpying), simply ignores the order as long as the value is a literal (is undefined behaviour though). This explains why the value remains unchanged but:

  • Why is the same memory address in both variables but the contained value differs?

AFAIK the same memory address cannot point to different values, so, one of the outputs is lying:

  • What's really happening? Which memory address is the fake one (if any)?

Making a few changes on the code above we can try to avoid the use of the literal value, so the trickery would do its work (source here):

// TEST 2
// Try with no-const reference
void change_with_no_const_ref(const unsigned int &const_value)
{
    unsigned int &no_const_ref = const_cast<unsigned int &>(const_value);
    no_const_ref = 0xfabada;
    LOG(1, const_value, no_const_ref);    
}

// Try with no-const pointer
void change_with_no_const_ptr(const unsigned int &const_value)
{
    unsigned int *no_const_ptr = const_cast<unsigned int *>(&const_value);
    *no_const_ptr = 0xb0bada;
    LOG(2, const_value, (*no_const_ptr));
}

// Try with c-style cast
void change_with_cstyle_cast(const unsigned int &const_value)
{
    unsigned int *no_const_ptr = (unsigned int *)&const_value;
    *no_const_ptr = 0xdeda1;
    LOG(3, const_value, (*no_const_ptr));
}

// Try with memcpy
void change_with_memcpy(const unsigned int &const_value)
{
    unsigned int *no_const_ptr = const_cast<unsigned int *>(&const_value);
    unsigned int brute_force = 0xba51c;
    std::memcpy(no_const_ptr, &brute_force, sizeof(const_value));
    LOG(4, const_value, (*no_const_ptr));
}

void change_with_union(const unsigned int &const_value)
{
    // Try with union
    union bad_idea
    {
        const unsigned int *const_ptr;
        unsigned int *no_const_ptr;
    } u;

    u.const_ptr = &const_value;
    *u.no_const_ptr = 0xbeb1da;
    LOG(5, const_value, (*u.no_const_ptr));
}

int main(int argc, char **argv)
{
    unsigned int value = 0xcafe01e;
    change_with_no_const_ref(value);
    change_with_no_const_ptr(value);
    change_with_cstyle_cast(value);
    change_with_memcpy(value);
    change_with_union(value);

    return 0;
}

Which produces the following output:

1.- Address = 0xbff0f5dc    Value = fabada
1.- Address = 0xbff0f5dc    Value = fabada
2.- Address = 0xbff0f5dc    Value = b0bada
2.- Address = 0xbff0f5dc    Value = b0bada
3.- Address = 0xbff0f5dc    Value = deda1
3.- Address = 0xbff0f5dc    Value = deda1
4.- Address = 0xbff0f5dc    Value = ba51c
4.- Address = 0xbff0f5dc    Value = ba51c
5.- Address = 0xbff0f5dc    Value = beb1da
5.- Address = 0xbff0f5dc    Value = beb1da

As we can see, the const-qualified variable was changed on each change_with_* call, and the behaviour is the same as before except for this fact, so I was tempted to assume that the weird behaviour of the memory address manifests when the const data is used as literal instead of value.

So, in order to ensure this assumption, I've made a last test, changing the unsigned int value in main to const unsigned int value:

// TEST 3
const unsigned int value = 0xcafe01e;
change_with_no_const_ref(value);
change_with_no_const_ptr(value);
change_with_cstyle_cast(value);
change_with_memcpy(value);
change_with_union(value);

Surprisingly the output is the same as TEST 2 (code here), so I suppose that the data is passed as variable not as literal value due to its usage as parameter, so this makes me wonder:

  • What things make the compiler to decide to optimize a const value as literal value?

In brief, my questions are:

  • In TEST 1.
    • Why the const value and the no-const value shares the same memory address but its contained value differs?
    • What steps follows the program to produce this output? Which memory address is the fake one (if any)?
  • In TEST 3
    • What things make the compiler to decide to optimize a const value as literal value?

回答1:


In general, it is pointless to analyse Undefined Behaviour, because there is no guarantee that you can transfer the results of your analysis to a different program.

In this case, the behaviour can be explained by assuming the compiler has applied the optimisation technique called constant propagation. In that technique, if you use the value of a const variable for which the compiler knows the value, then the compiler replaces the use of the const variable with the value of that variable (as it is known at compile time). Other uses of the variable, such as taking its address, are not replaced.

This optimisation is valid, precisely because changing a variable that was defined as const results in Undefined Behaviour and the compiler is allowed to assume a program does not invoke undefined behaviour.

So, in TEST 1, the addresses are the same, because it is all the same variable, but the values differ because the first of each pair reflects what the compiler presumes (rightly) to be the value of the variable and the second reflects what is actually stored there. In TEST 2 and TEST 3, the compiler can't make the optimisation, because the compiler can't be 100% sure that the function argument will refer to a constant value (and in TEST 2, it doesn't).



来源:https://stackoverflow.com/questions/16668656/explanation-of-the-ub-while-changing-data

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!