问题

Consider the following simple example:

struct __attribute__ ((__packed__)) {
 int code[1];
 int place_holder[100];
} s;

void test(int n)
{
 int i;

 for (i = 0; i < n; i++) {
  s.code[i] = 1;
 }
}

The for-loop is writing to the field code, which is of size 1. The next field after code is place_holder.
I would expect that in case of n > 1, the write to code array would overflow and 1 would be written to place_holder.

However, when compiling with -O2 (on gcc 4.9.4 but probably on other versions as well) something interesting happens.
The compiler identifies that the code might overflow array code, and limits loop unrolling to 1 iteration.

It's easy to see that when compiling with -fdump-tree-all and looking at the last tree pass ("t.optimized"):


;; Function test (test, funcdef_no=0, decl_uid=1366, symbol_order=1)

Removing basic block 5
test (int n)
{
  <bb 2>:
  # DEBUG i => 0
  # DEBUG i => 0
  if (n_4(D) > 0)
    goto <bb 3>;
  else
    goto <bb 4>;

  <bb 3>:
  s.code[0] = 1;
  # DEBUG i => 1
  # DEBUG i => 1

  <bb 4>:
  return;

}

So in this case the compiler completely unrolled the loop to a single iteration.

My questions are:

From C specification viewpoint, is overflowing (deliberately) from one struct member to the next is illegal or undefined behavior?
Let's assume I'm aware of the struct layout in memory and know what I'm doing when deliberately overflowing the code array.
Is there a way to prevent gcc from unrolling the loop in such case? I know I can completely prevent loop unrolling, however I'm still interested in loop unrolling on other cases. I also suspect that the analysis the compiler is doing might affect passes other than loop unrolling.
gcc is assuming I'm not going to overflow when accessing my array, so what I'm really looking for is way to tell the compiler not to take this assumption (by providing some compiler option).

I'm aware it's a bad practice to write such code that overflows from one field to another, and I'm not intending to write such code.
I'm also aware of the practice to put an array (possibly zero sized) as the last struct field to allow it to overflow, this is well supported by compilers, while in this case the array code is not the last field.
So this is not a question of "how to fix the code", but rather a question of understanding the compiler assumptions and affecting them.

These questions came up when I observed existing code that was already written in such way, and debugged it to find out why it's not behaving as the original developer expected it to behave.
The risk is that there are other places in the code where such problem exists. Static analysis tools can help to find out, but I would also like to know if there's a way to make the compiler tolerate such code and still generate the result we would expect.

Update

I got clear answer to question (1) above, but not for question (2).

Can gcc allow this as an extension, by some compile options?
Is there a way to at least get a warning when gcc identifies it? (and it clearly identifies it, by optimizing things out).
That's important in order to identify such cases in a large existing code base.

回答1:

From C specification viewpoint, is overflowing (deliberately) from one struct member to the next is illegal or undefined behavior?

It is undefined behavior. The arr[i] operator is syntactic sugar around *(arr + i). So array access boils down to the binary + operator for pointer arithmetic, C17 6.5.6 additive operators, from §7 and §8:

For the purposes of these operators, a pointer to an object that is not an element of an array behaves the same as a pointer to the first element of an array of length one with the type of the object as its element type.

When an expression that has integer type is added to or subtracted from a pointer, the result has the type of the pointer operand. /--/
If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined. If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.

As you noticed, optimizing compilers might exploit these rules to produce faster code.

Is there a way to prevent gcc from unrolling the loop in such case?

There is a a special exception rule that can be used, C17 6.3.2.3/7:

When a pointer to an object is converted to a pointer to a character type, the result points to the lowest addressed byte of the object. Successive increments of the result, up to the size of the object, yield pointers to the remaining bytes of the object.

Also, strict aliasing does not apply to character types, because of another special rule in C17 6.5 §7

An object shall have its stored value accessed only by an lvalue expression that has one of the following types: ... a character type.

These two special rules co-exist in harmony. So assuming we don't mess up alignment etc during the pointer conversion, this means that we are allowed to do this:

unsigned char* i;
for(i = (unsigned char*)&mystruct; i < (unsigned char*)(&mystruct + 1); i++)
{
  do_something(*i);
}

This may however read padding bytes etc so it's "implementation-defined". But in theory you can access the struct byte per byte, and as long as the struct offsets are calculated on byte-per-byte basis, you can iterate across multiple members of the struct (or any other object) in this manner.

As far as I can tell, this very questionable-looking code should be well-defined:

#include <stdint.h>
#include <string.h>
#include <stdio.h>

struct __attribute__ ((__packed__)) {
 int code[1];
 int place_holder[100];
} s;

void test(int val, int n)
{
  for (unsigned char* i = (unsigned char*)&s; 
       i < (unsigned char*)&s + n*sizeof(int); 
       i += _Alignof(int)) 
  {
    if((uintptr_t)i % _Alignof(int) == 0) // not really necessary, just defensive prog.
    {
      memcpy(i, &val, sizeof(int));
      printf("Writing %d to address %p\n", val, (void*)i);
    }
  }
}

int main (void)
{
  test(42, 3);
  printf("%d %d %d\n", s.code[0], s.place_holder[0], s.place_holder[1]);
}

This works fine on gcc and clang (x86). How efficient it is, well that's another story. Please don't write code like this, though.

回答2:

From C specification viewpoint, is overflowing (deliberately) from one struct member to the next is illegal or undefined behavior?

It's undefined behavior to access an array out-of-bounds. From C11 J.2:

The behavior is undefined in the following circumstances:

[...]

An array subscript is out of range [...]

Is there a way to prevent gcc from unrolling the loop in such case?

Alias code with a volatile pointer. But even using an intermediary pointer seems to work. godbolt link

回答3:

Just _Static_assert the layout and do the pointer arithmetic in (char*), then cast to (int*) and do the access. No further tricks such as memcpy/_Alignof are required because ints are unpadded and you are accessing ints where there really are ints.

This alone makes gcc unroll the loop.

Why character-pointer based (char*, signed char*, unsigned char*) pointer arithmetic is required is because http://port70.net/~nsz/c/c11/n1570.html#J.2 (non-normatively, as it is just an appendix, but gcc seems to follow it) makes out-of bounds accesses UB, but http://port70.net/~nsz/c/c99/n1256.html#6.2.6.1p4 and http://port70.net/~nsz/c/c99/n1256.html#6.5p6 still allow inspecting any object via character pointers (more discussion on this at Is accessing an element of a multidimensional array out of bounds undefined behavior?).

Alternatively you could do the pointer arithmetic via uintptr_t (then it will be implementation defined) but gcc optimizes those worse in certain cases (gcc doesn't fold (uintptr_t)p < (uintptr_t)(p+10) into true, but it does so for (char*)p < (char*)(p+10). This could be considered a missed optimization).

struct  __attribute__ ((__packed__)) s {
    int code[1];
    int place_holder[100];
} s;


void test_s(int n) //original
{
    int i;
    for (i = 0; i < n; i++) {
        s.code[i] = 1;
    }
}

#include <stddef.h> //offsetof
void test_s2(int n) //unrolls the loop
{
    _Static_assert(offsetof(struct s,code)+sizeof(int)==offsetof(struct s,place_holder),"");
    //^will practically hold even without __attribute__((__packed__))

    int i; for (i = 0; i < n; i++)
        *(int*)((char*)&s.code + (size_t)i*sizeof(s.code[0])) = 1;
}

/////////////


//same code as test_s2
struct r {
    int code101[101];
} r;
void test_r(int n)
{
    int i;

    for (i = 0; i < n; i++) {
        r.code101[i] = 1;
    }
}

回答4:

1. Question:

"From C specification viewpoint, is overflowing (deliberately) from one struct member to the next illegal or undefined behavior?"

It is undefined behavior. The C standard states (emphasize mine):

"A postfix expression followed by an expression in square brackets [] is a subscripted designation of an element of an array object. The definition of the subscript operator [] is that E1[E2] is identical to (*((E1)+(E2))). Because of the conversion rules that apply to the binary + operator, if E1 is an array object (equivalently, a pointer to the initial element of an array object) and E2 is an integer, E1[E2] designates the E2-th element of E1 (counting from zero)."

Source: ISO/IEC 9899:2018 (C18), §6.5.2.1/2

"When an expression that has integer type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integer expression. In other words, if the expression P points to the i-th element of an array object, the expressions (P) + N (equivalently, N + (P)) and (P) - N (where N has the value n) point to, respectively, the i+n-th and i−n-th elements of the array object, provided they exist. Moreover, if the expression P points to the last element of an array object, the expression (P) + 1 points one past the last element of the array object, and if the expression Q points one past the last element of an array object, the expression (Q) - 1 points to the last element of the array object. If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined. If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated."

Source: ISO/IEC 9899:2018 (C18), §6.5.6/8

Also non-normative Annex J states with regard to paragraph §6.5.6 in the normative standard:

J.2 Undefined behavior

1 The behavior is undefined in the following circumstances:

....

An array subscript is out of range, even if an object is apparently accessible with the given subscript (as in the lvalue expression a[1][7] given the declaration int a[4][5]) (6.5.6).

2. Question (plus update):

"Is there a way to prevent gcc from unrolling the loop in such case?"

"Can gcc allow this as an extension, by some compile options?"

"Is there a way to at least get a warning when gcc identifies it? That's important in order to identify such cases in a large existing code base."

You could try to place an empty assembly code function like asm(""); into the loop, as shown in this answer by Denilson Sá Maia, f.e.:

 for (i = 0; i < n; i++) {
    s.code[i] = 1;
    asm("");
 }

or #pragma's around the test function, as shown here, f.e.:

#pragma GCC push_options
#pragma GCC optimize ("O0")

void test(int n)
{
   int i;

   for (i = 0; i < n; i++) {
      s.code[i] = 1;
   }
}

#pragma GCC pop_options

to prevent the optimization for that specific program part in general and with that the loop unrolling.

How to prevent gcc optimizing some statements in C?
How to prevent GCC from optimizing out a busy wait loop?
Is there a way to tell GCC not to optimise a particular piece of code?

It is not preventing the loop unrolling, but you can use AddressSanitizer, which also got LeakSanitizer integrated, and is built into GCC since version 4.8 to detect when the loop unrolling doesn't work/you access non-affiliated memory.

More information about this, you can find here.

Edit: As you said your target implementation is MIPS, you can still use Valgrind to detect memory leaks.

回答5:

This is not correct. The answer can be found here.

It is not guaranteed that the compiler will not nest some padding space of arbitrary size between code and place_holder. It is implementation defined if the padding space is inserted.

回答6:

In the language Dennis Ritchie described in 1974, the behavior of struct member access operators and pointer arithmetic were defined in terms of machine addresses, and except for the use of object size to scale pointer arithmetic, were agnostic as to the types of objects the addresses represented. The C Standard allows implementations to behave in that fashion when their customers would find it useful, but would also allow them to do other things, such as trapping out-of-bounds array accesses, if customers would find those other behaviors more useful.

Although later C dialects effectively behaved as though struct member names are prefixed by the struct name, so as to give each structure type its own member namespace, in most other respects compilers can be configured, by disabling optimizations if nothing else, to behave in a fashion consistent with Ritchie's 1974 language. Unfortunately, there's no way to distinguish implementations that will consistently behave in that fashion from those that won't; some compilers, especially those which go back to a time before the Standard, don't explicitly document that they support the 1974 behaviors because they were written at a time when compilers were generally expected to do so unless they documented otherwise.

来源：https://stackoverflow.com/questions/62692609/allowing-struct-field-to-overflow-to-the-next-field

标签

gcc

struct

compiler-optimization

loop-unrolling

Allowing struct field to overflow to the next field

问题