Dereferencing one past the end pointer to array type

问题

Is it well defined in c++ to dereference a one-past-the-end pointer to an array type?

Consider the following code :

#include <cassert>
#include <iterator>

int main()
{
    // An array of ints
    int my_array[] = { 1, 2, 3 };

    // Pointer to the array
    using array_ptr_t = int(*)[3];
    array_ptr_t my_array_ptr = &my_array;

    // Pointer one-past-the-end of the array
    array_ptr_t my_past_end = my_array_ptr + 1;

    // Is this valid?
    auto is_this_valid = *my_past_end;

    // Seems to yield one-past-the-end of my_array
    assert(is_this_valid == std::end(my_array));
}

Common wisdom is that it's undefined behavior to dereference a one-past-the-end pointer. However, does this hold true for pointers to array types?

It seems reasonable that this should be valid since *my_past_end can be solved purely with pointer arithmetic and yields a pointer to the first element in the array that would be there, which happens to also be a valid one-past-the-end int* for the original array my_array.

However, another way of looking at it is that *my_past_end is producing a reference to an array that doesn't exist, which implicitly converts to an int*. That reference seems problematic to me.

For context, my question was brought on by this question, specifically the comments to this answer.

Edit : This question is not a duplicate of Take the address of a one-past-the-end array element via subscript: legal by the C++ Standard or not? I'm asking if the rule explained in the question also apply for pointers pointing to an array type.

Edit 2 : Removed auto to make explicit that my_array_ptr is not a int*.

回答1:

This is CWG 232. That issue might seem like it's mainly about dereferencing a null pointer but it's fundamentally about what it means to simply dereference something that doesn't point to an object. There is no explicit language rule about this case.

One of the examples in the issue is:

Similarly, dereferencing a pointer to the end of an array should be allowed as long as the value is not used:
char a[10];
char *b = &a[10];   // equivalent to "char *b = &*(a+10);"
Both cases come up often enough in real code that they should be allowed.

This is basically the same thing as OP (the a[10] part of the above expression), except using char instead of an array type.

Common wisdom is that it's undefined behavior to dereference a one-past-the-end pointer. However, does this hold true for pointers to array types?

There is no difference in the rules based on what kind of pointer it is. my_past_end is a past-the-end pointer, so whether it's UB to dereference it or not is not a function of the fact that it points to an array as opposed to any other kind of type.

While the type of is_this_valid an int* which gets initialized from a int(&)[3] (array-to-pointer decay), and thus nothing here actually reads from memory - that is immaterial to the way the language rules work. my_past_end is a pointer whose value is past the end of an object, and that's the only thing that matters.

回答2:

The standard seems to suggest that this is not undefined behaviour.

The relevant section of the standard is as follows (about the result of adding a pointer type to an integral type or the other way around)

§5.7p4 [expr.add]

When an expression that has integral type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object⁸⁴, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integral expression. [...] the expression (P)+1 points one past the last element of the array object. [...] If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce and overflow; otherwise, the behavior is undefined.

With footnote 84 reading:

An object that is not an array element is considered to belong to a single element array for this purpose; see 5.3.1

(And §5.3.1 is about & and *)

So, for the purposes of where my_array_ptr and my_past_end point, they point to my_array, as if my_array was actually a int[1][3]. my_array_ptr points to the first element (The int[3] that my_array actually is). my_past_end points to the one-past-the-end element, and that is well defined.

When you do *my_past_end, you create an lvalue to a int[3]. As long as this is not converted to a prvalue, you do not actually access memory that is not an int[3] as if it was an int[3].

§3.9.2p1 [basic.compound]

Compound types can be constructed in the following ways:
[...]
4. references to objects or functions of a given type

§3.9.2p3 [basic.compound]

[...] [ Note: For instance, the address one past the end of an array (5.7) would be considered to point to an unrelated object of the array's element type that might be located at that address [...]

Notice how it tries very hard to make sure that the past-the-end-pointer is still defined as the address of an object. Because references can only refer to objects, this allows "invalid" references like *(past-the-end-pointer) but still disallowing null references, as nullptr doesn't point to an object.

§4.2p1 [conv.array]

An lvalue or rvalue of type "array of N T" or "array of unknown bound of T" can be converted to a prvalue of type "pointer to T". The result is a pointer to the first element of the array.

Since the lvalue is being converted, no invalid memory is accessed. Thus, during the conversion, a prvalue of type int* is created pointing to the same address as &my_array[3], which is the value of std::end(my_array). So, they will be equal (as, unsurprisingly, pointers that point to the same address are defined as equal)

You could also convert my_past_end to an int* directly and it would work, as int[3] is a compound type of ints (int is a subobject of int[3]), so that would be a less confusing way to do it.

As a side note, the reason why &my_array[4] works in C as well as C++, even though C has no references, is because my_array[4] is defined as *(my_array + 4), and &my_array[4] is &*(my_array + 4), and &*(expression), in C, is the same as converting (expression) into an rvalue (and effectively asserting that it's a non-null pointer). Since no such exception exists in C++, the logic shown here (my_array[4] is a reference that can't be converted into a prvalue) is used.

It does seem pretty ambiguous. "Unrelated" objects are never mentioned again in the standard. They could be occupied by something else, for example in:

int arr[3][3] = {
    {1, 2, 3},
    {4, 5, 6},
    {7, 8, 9}
};

arr[0][3] and arr[1][0] point to the same memory address. But does arr[0][3] = 10; mean that arr[1][0] will be updated to read 10?

int test() {
    int arr[3][3] = {
        {1, 2, 3},
        {4, 5, 6},
        {7, 8, 9}
    };

    const int& i = arr[1][0];
    arr[0][3] = 10;
    return i;
}

Seems to return 10 in msvc and GCC (Optimising to mov eax, 10 ret)

Since the reference refers to an object with an address, &(reference) is well defined. But since these "unrelated" objects are never mentioned again, using anything other than their address is, by virtue of it literally not being defined, undefined behaviour.

回答3:

I believe it's well defined, because it doesn't dereference the one-past-the-end pointer.

auto is_this_valid = *my_past_end;

my_past_end is of type int(*)[3] (pointer to array of 3 int elements). The expression *my_past_end is of therefore of type int[3] -- so like any array expression in this context, it "decays" to a pointer of type int*, pointing to the initial (zeroth) element of the array object. This "decay" is a compile-time operation. So the initialization simply initializes is_this_valid, a pointer of type int*, to point just past the end of my_array. No memory past the end of the array object is accessed.

来源：https://stackoverflow.com/questions/52727045/dereferencing-one-past-the-end-pointer-to-array-type

标签

c++

arrays

language-lawyer

dereference