Casting a char array to an object pointer - is this UB?

我的未来我决定 提交于 2019-12-07 00:22:26

问题


I recently saw a class like this that was used to construct objects "on-demand" without having to use dynamic memory allocation for various reasons.

#include <cassert>

template<typename T>
class StaticObject
{
public:
    StaticObject() : constructed_(false)
    {
    }

    ~StaticObject()
    {
        if (constructed_)
            ((T*)object_)->~T();
    }

    void construct()
    {
        assert(!constructed_);

        new ((T*)object_) T;
        constructed_ = true;
    }

    T& operator*()
    {
        assert(constructed_);

        return *((T*)object_);
    }

    const T& operator*() const
    {
        assert(constructed_);

        return *((T*)object_);
    }

private:
    bool constructed_;
    alignas(alignof(T)) char object_[sizeof(T)];
};

Is this code, namely the casting of a properly aligned char array to an object pointer, considered undefined behavior by the C++14 standard or is it completely fine?


回答1:


This program technically has undefined behavior, although it's likely to work on most implementations. The issue is that a cast from char* to T* is not guaranteed to result in a valid pointer to the T object created by placement new, even though the char* pointer represents the address of the first byte used for storage for the T object.

[basic.compound]/3:

Pointers to layout-compatible types shall have the same value representation and alignment requirements ([basic.align]).

In general, T will not be layout-compatible with char or with alignas(T) char[sizeof(T)], so there's no requirement that a pointer T* has the same value representation as a pointer char* or void*.

[basic.compound]/4:

Two objects a and b are pointer-interconvertible if:

  • they are the same object, or

  • one is a union object and the other is a non-static data member of that object ([class.union]), or

  • one is a standard-layout class object and the other is the first non-static data member of that object, or, if the object has no non-static data members, any base class subobject of that object ([class.mem]), or

  • there exists an object c such that a and c are pointer-interconvertible, and c and b are pointer-interconvertible.

If two objects are pointer-interconvertible, then they have the same address, and it is possible to obtain a pointer to one from a pointer to the other via a reinterpret_cast. [ Note: An array object and its first element are not pointer-interconvertible, even though they have the same address. — end note ]

[Aside: DR 2287 changed "standard-layout union" to "union" in the second bullet after the publication of C++17. But that doesn't affect this program.]

The T object created by the placement new is not pointer-interconvertible with object_ or with object_[0]. And the note hints that this might be a problem for casts...

For the C-style cast ((T*)object_), we need to see [expr.cast]/4:

The conversions performed by

  • a const_cast,

  • a static_cast,

  • a static_cast followed by a const_cast,

  • a reinterpret_cast, or

  • a reinterpret_cast followed by a const_cast

can be performed using the cast notation of explicit type conversion....

If a conversion can be interpreted in more than one of the ways listed above, the interpretation that appears first in the list is used, even if a cast resulting from that interpretation is ill-formed.

Unless T is char or cv-qualified char, this will effectively be a reinterpret_cast, so next we look at [expr.reinterpret.cast]/7:

An object pointer can be explicitly converted to an object pointer of a different type. When a prvalue v of object pointer type is converted to the object pointer type "pointer to cv T", the result is static_­cast<cvT*>(static_­cast<cvvoid*>(v)).

So first we have a static_cast from char* to void*, which does the standard conversion described in [conv.ptr]/2:

A prvalue of type "pointer to cv T", where T is an object type, can be converted to a prvalue of type "pointer to cv void". The pointer value ([basic.compound]) is unchanged by this conversion.

This is followed by a static_cast from void* to T*, described in [expr.static.cast]/13:

A prvalue of type "pointer to cv1 void" can be converted to a prvalue of type "pointer to cv2 T", where T is an object type and cv2 is the same cv-qualification as, or greater cv-qualification than, cv1. If the original pointer value represents the address A of a byte in memory and A does not satisfy the alignment requirement of T, then the resulting pointer value is unspecified. Otherwise, if the original pointer value points to an object a, and there is an object b of type T (ignoring cv-qualification) that is pointer-interconvertible with a, the result is a pointer to b. Otherwise, the pointer value is unchanged by the conversion.

As already noted, the object of type T is not pointer-interconvertible with object_[0], so that sentence does not apply, and there's no guarantee that the result T* points at the T object! We're left with the sentence saying "the pointer value is unchanged", but this might not be the result we want if the value representations for char* and T* pointers are too different.

A Standard-compliant version of this class could be implemented using a union:

template<typename T>
class StaticObject
{
public:
    StaticObject() : constructed_(false), dummy_(0) {}
    ~StaticObject()
    {
        if (constructed_)
            object_.~T();
    }
    StaticObject(const StaticObject&) = delete; // or implement
    StaticObject& operator=(const StaticObject&) = delete; // or implement

    void construct()
    {
        assert(!constructed_);

        new(&object_) T;
        constructed_ = true;
    }

    T& operator*()
    {
        assert(constructed_);

        return object_;
    }

    const T& operator*() const
    {
        assert(constructed_);

        return object_;
    }

private:
    bool constructed_;
    union {
        unsigned char dummy_;
        T object_;
    }
};

Or even better, since this class is essentially attempting to implement an optional, just use std::optional if you have it or boost::optional if you don't.




回答2:


Casting a char array to an object pointer - is this UB?

Casting one pointer (the array decays to a pointer) to another pointer that is not in same inheritance hierarchy using a C-style cast performs a reinterpret cast. A reinterpret cast itself never has UB.

However, indirecting a converted pointer can have UB if an object of appropriate type has not been constructed into that address. In this case, an object has been constructed in the character array, so the indirection has well defined behaviour. Edit: The indirection would be UB free, if it weren't for the strict aliasing rules; see ascheplers answer for details. aschepler shows a C++14 conforming solution. In C++17, your code can be corrected with following changes:

void construct()
{
    assert(!constructed_);
    new (object_) T; // removed cast
    constructed_ = true;
}

T& operator*()
{
    assert(constructed_);
    return *(std::launder((T*)object_));
}

To construct an object into an array of another type, three requirements must be met to avoid UB: The other type must be allowed to alias the object type (char, unsigned char and std::byte satisfy this requirement for all object types), the address must be aligned to the memory boundary as required by the object type and none of the memory must overlap with the lifetime of another object (ignoring the underlying objects of the array which are allowed to alias the overlaid object). All of those requirements are satisfied by your program.




回答3:


After writing the comment to @aschepler answer I think I found the proper answer:

No it is not UB!

Very strong hint: aligned_storage is exactly for doing that.

  • basic.compound[4] gives us the definition of "pointer-interconvertible". None of the cases apply so T* and unsigned char[...] are not pointer-interconvertible.
  • conv.ptr[2] and expr.static.cast[13] tells us what happens on the reinterprer_cast<T*>(object_). Basically the (intermediate) cast to void* does not change the value of the pointer and the cast from void* to T* does also not change it:

    If the original pointer value represents the address A of a byte in memory and A does not satisfy the alignment requirement of T, then the resulting pointer value is unspecified. Otherwise, if the original pointer value points to an object a, and there is an object b of type T (ignoring cv-qualification) that is pointer-interconvertible with a, the result is a pointer to b. Otherwise, the pointer value is unchanged by the conversion.

    We have a properly aligned, not pointer-interconvertible type here. Hence unchanged value.

  • Now before P0137 (found in another answer) basic.compound[3] said:

    If an object of type T is located at an address A, a pointer of type cv T* whose value is the address A is said to point to that object, regardless of how the value was obtained.

    Now it says basic.compound[3]

    Every value of pointer type is one of the following:

    (3.1) a pointer to an object or function (the pointer is said to point to the object or function), [...]

    Which I consider equivalent for this purpose.

  • Finally we need basic.lval[11]

    If a program attempts to access the stored value of an object through a glvalue whose type is not similar ([conv.qual]) to one of the following types the behavior is undefined:52 [...]

    (11.3) a char, unsigned char, or std​::​byte type.

    This boils down to the aliasing rules which only allow certain types to alias and our unsigned char is part of it.

So in summary:

  • We meet the alignment and aliasing rules
  • We get a defined pointer value to a T* (which is the same as the unsigned char*)
  • Hence we have a valid object at that place

This is basically what @eerorika also has. But I think from the arguments above that the code is fully valid at least if T does not have any const are reference members in which case std::launder must be used. Even then, if the memory is not reused (but only used for creating 1 T) then it should also be valid.

However older GCC (<7.2) complains about strict-aliasing violation: https://godbolt.org/z/Gjs05C although the docu states:

For example, an unsigned int can alias an int, but not a void* or a double. **A character type may alias any other type. **

This is a bug




回答4:


When you create such a StaticObject it does reserve storage with the proper alignment constraints for a T object and the correct size, but doesn't construct the object.

When construct() is called, it invokes a placement-new to construct the object in the reserved storage (properly aligned and not null). It's not the most natural way to proceed, but there is no UB here.

The only thing which could be UB, is if the placement new would overwrite an already existing object. But this is prevent through the assert().




回答5:


You do have undefined behavior.

object_ isn't a T*, so casting and dereferencing it is UB. You cannot use object_ to refer to the newly created object. This is also known as strict aliasing.

The fix however is easy: Just create a new member variable T* that you use to access the constructed object. Then you need to assign the result of placement new to that pointer:

ptr = new(object_) T;

[basic.life]p1 says:

The lifetime of an object o of type T ends when:

  • if T is a class type with a non-trivial destructor, the destructor call starts, or

  • the storage which the object occupies is released, or is reused by an object that is not nested within o.

So by doing new (object_) T;, you are ending the lifetime of the original char[] object and are starting the lifetime of the new T object we'll call t.

Now we have to examing whetherr *((T*)object_) is valid.

[basic.life]p8 with the important bits highlighted:

If, after the lifetime of an object has ended and before the storage which the object occupied is reused or released, a new object is created at the storage location which the original object occupied, a pointer that pointed to the original object, a reference that referred to the original object, or the name of the original object will automatically refer to the new object and, once the lifetime of the new object has started, can be used to manipulate the new object, if:

  • the storage for the new object exactly overlays the storage location which the original object occupied, and

  • the new object is of the same type as the original object (ignoring the top-level cv-qualifiers), and

  • the type of the original object is not const-qualified, and, if a class type, does not contain any non-static data member whose type is const-qualified or a reference type, and

The second point is not true (T vs char[]), so you cannot use object_ as a pointer to the newly created object t.



来源:https://stackoverflow.com/questions/51231757/casting-a-char-array-to-an-object-pointer-is-this-ub

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!