May I treat a 2D array as a contiguous 1D array?

问题

Consider the following code:

int a[25][80];
a[0][1234] = 56;
int* p = &a[0][0];
p[1234] = 56;

Does the second line invoke undefined behavior? How about the fourth line?

回答1:

It's up to interpretation. While the contiguity requirements of arrays don't leave much to the imagination in terms of how to layout a multidimensional arrays (this has been pointed out before), notice that when you're doing p[1234] you're indexing the 1234th element of the zeroth row of only 80 columns. Some interpret the only valid indices to be 0..79 (&p[80] being a special case).

Information from the C FAQ which is the collected wisdom of Usenet on matters relevant to C. (I do not think C and C++ differ on that matter and that this is very much relevant.)

回答2:

Both lines do result in undefined behavior.

Subscripting is interpreted as pointer addition followed by an indirection, that is, a[0][1234]/p[1234] is equivalent to *(a[0] + 1234)/*(p + 1234). According to [expr.add]/4 (here I quote the newest draft, while for the time OP is proposed, you can refer to this comment, and the conclusion is the same):

If the expression P points to element x[i] of an array object x with n elements, the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) element x[i+j] if 0≤i+j≤n; otherwise, the behavior is undefined.

since a[0](decayed to a pointer to a[0][0])/p points to an element of a[0] (as an array), and a[0] only has size 80, the behavior is undefined.

As Language Lawyer pointed out in the comment, the following program does not compile.

constexpr int f(const int (&a)[2][3])
{
    auto p = &a[0][0];
    return p[3];
}

int main()
{
    constexpr int a[2][3] = { 1, 2, 3, 4, 5, 6, };
    constexpr int i = f(a);
}

The compiler detected such undefined behaviors when it appears in a constant expression.

回答3:

Yes, you can(no, it's not UB), it is indirectly guaranteed by the standard. Here's how: a 2D array is an array of arrays. An array is guaranteed to have contiguous memory and sizeof(array) is sizeof(elem) times number of elements. From these it follows that what you're trying to do is perfectly legal.

回答4:

In the language the Standard was written to describe, there would be no problem with invoking a function like:

void print_array(double *d, int rows, int cols)
{
  int r,c;
  for (r = 0; r < rows; r++)
  {
    printf("%4d: ", r);
    for (c = 0; c < cols; c++)
      printf("%10.4f ", d[r*cols+c]);
    printf("\n");
  }
}

on a double[10][4], or a double[50][40], or any other size, provided that the total number of elements in the array was less than rows*cols. Indeed, the guarantee that the row stride of a T[R][C] would equal C * sizeof (T) was designed among other things to make it possible to write code that could work with arbitrarily-sized multi-dimensional arrays.

On the other hand, the authors of the Standard recognized that when implementations are given something like:

double d[10][10];
double test(int i)
{
  d[1][0] = 1.0;
  d[0][i] = 2.0;
  return d[1][0]; 
}

allowing them to generate code that would assume that d[1][0] would still hold 1.0 when the return executes, or allowing them to generate code that would trap if i is greater than 10, would allow them to be more suitable for some purposes than requiring that they silently return 2.0 if invoked with i==10.

Nothing in the Standard makes any distinction between those scenarios. While it would have been possible for the Standard to have included rules that would say that the second example invokes UB if i >= 10 without affecting the first example (e.g. say that applying [N] to an array doesn't cause it to decay to a pointer, but instead yields the Nth element, which must exist in that array), the Standard instead relies upon the fact that implementations are allowed to behave in useful fashion even when not required to do so, and compiler writers should presumably be capable of recognizing situations like the first example when doing so would benefit their customers.

Since the Standard never sought to fully define everything that programmers would need to do with arrays, it should not be looked to for guidance as to what constructs quality implementations should support.

回答5:

Your compiler will throw a bunch of warnings/errors because of subscript out of range (line 2) and incompatioble types (line 3), but as long as the actual variable (int in this case) is one of the intrinsic base-types this is save to do in C and C++. (If the variable is a class/struct it will probably still work in C, but in C++ all bets are off.)

Why you would want to do this.... For the 1st variant: If your code relies on this sort of messing about it will be error-prone and hard to maintain in the long run.

I can see a some use for the second variant when performance optimizing loops over 2D arrays by replacing them by a 1D pointer run over the data-space, but a good optimizing compiler will often do that by itself anyway. If the body of the loop is so big/complex the compiler can't optimize/replace the loop by a 1D run on it's own, the performance gain by doing it manually will most likely not be significant either.

回答6:

You're free to reinterpret the memory any way you'd like. As long as the multiple does not exceed the linear memory. You can even move a to 12, 40 and use negative indexes.

回答7:

The memory referenced by a is both a int[25][80] and a int[2000]. So says the Standard, 3.8p2:

[ Note: The lifetime of an array object starts as soon as storage with proper size and alignment is obtained, and its lifetime ends when the storage which the array occupies is reused or released. 12.6.2 describes the lifetime of base and member subobjects. — end note ]

a has a particular type, it is an lvalue of type int[25][80]. But p is just int*. It is not "int* pointing into a int[80]" or anything like that. So in fact, the int pointed to is an element of int[25][80] named a, and also an element of int[2000] occupying the same space.

Since p and p+1234 are both elements of the same int[2000] object, the pointer arithmetic is well-defined. And since p[1234] means *(p+1234), it too is well-defined.

The effect of this rule for array lifetime is that you can freely use pointer arithmetic to move through a complete object.

Since std::array got mentioned in the comments:

If one has std::array<std::array<int, 80>, 25> a; then there does not exist a std::array<int, 2000>. There does exist a int[2000]. I'm looking for anything that requires sizeof (std::array<T,N>) == sizeof (T[N]) (and == N * sizeof (T)). Absent that, you have to assume that there could be gaps which mess up traversal of nested std::array.

来源：https://stackoverflow.com/questions/66100693/is-flattening-a-multi-dimensional-array-with-a-cast-ub

标签

c++

arrays

pointers

multidimensional-array

undefined-behavior