Multiple structures in a single malloc invoking undefined behaviour?

问题

From Use the correct syntax when declaring a flexible array member it says that when malloc is used for a header and flexible data when data[1] is hacked into the struct,

This example has undefined behavior when accessing any element other than the first element of the data array. (See the C Standard, 6.5.6.) Consequently, the compiler can generate code that does not return the expected value when accessing the second element of data.

I looked up the C Standard 6.5.6, and could not see how this would produce undefined behaviour. I've used a pattern that I'm comfortable with, where the header is implicitly followed by data, using the same sort of malloc,

#include <stdlib.h> /* EXIT malloc free */
#include <stdio.h>  /* printf */
#include <string.h> /* strlen memcpy */

struct Array {
    size_t length;
    char *array;
}; /* +(length + 1) char */

static struct Array *Array(const char *const str) {
    struct Array *a;
    size_t length;
    length = strlen(str);
    if(!(a = malloc(sizeof *a + length + 1))) return 0;
    a->length = length;
    a->array = (char *)(a + 1); /* UB? */
    memcpy(a->array, str, length + 1);
    return a;
}

/* Take a char off the end just so that it's useful. */
static void Array_to_string(const struct Array *const a, char (*const s)[12]) {
    const int n = a->length ? a->length > 9 ? 9 : (int)a->length - 1 : 0;
    sprintf(*s, "<%.*s>", n, a->array);
}

int main(void) {
    struct Array *a = 0, *b = 0;
    int is_done = 0;
    do { /* Try. */
        char s[12], t[12];
        if(!(a = Array("Foo!")) || !(b = Array("To be or not to be."))) break;
        Array_to_string(a, &s);
        Array_to_string(b, &t);
        printf("%s %s\n", s, t);
        is_done = 1;
    } while(0); if(!is_done) {
        perror(":(");
    } {
        free(a);
        free(b);
    }
    return is_done ? EXIT_SUCCESS : EXIT_FAILURE;
}

Prints,

<Foo> <To be or >

The compliant solution uses C99 flexible array members. The page also says,

Failing to use the correct syntax when declaring a flexible array member can result in undefined behavior, although the incorrect syntax will work on most implementations.

Technically, does this C90 code produce undefined behaviour, too? And if not, what is the difference? (Or the Carnegie Mellon Wiki is incorrect?) What is the factor on the implementations this will not work on?

回答1:

This should be well defined:

a->array = (char *)(a + 1);

Because you create a pointer to one element past the end of an array of size 1 but do not dereference it. And because a->array now points to bytes that do not yet have an effective type, you can use them safely.

This only works however because you're using the bytes that follow as an array of char. If you instead tried to create an array of some other type whose size is greater than 1, you could have alignment issues.

For example, if you compiled a program for ARM with 32 bit pointers and you had this:

struct Array {
    int size;
    uint64_t *a;
};
...
Array a = malloc(sizeof *a + (length * sizeof(uint64_t)));
a->length = length;
a->a= (uint64_t *)(a + 1);       // misaligned pointer
a->a[0] = 0x1111222233334444ULL;  // misaligned write

Your program would crash due to a misaligned write. So in general you shouldn't depend on this. Best to stick with a flexible array member which the standard guarantees will work.

回答2:

As an adjunct to @dbush good answer, a way to get around alignment woes is to use a union. This insures &p[1] is properly aligned for (uint64_t*)¹. sizeof *p includes any needed padding vs. sizeof *a.

  union {
    struct Array header;
    uint64_t dummy;
  } *p;
  p = malloc(sizeof *p + length*sizeof p->header->array);

  struct Array *a = (struct Array *)&p[0]; // or = &(p->header);
  a->length = length;
  a->array = (uint64_t*) &p[1]; // or &p[1].dummy;

Or go with C99 and flexible array member.

¹ As well as struct Array

回答3:

Before the publication of C89, there were some implementations that would attempt to identify and trap upon out-of-bounds array accesses. Given something like:

struct foo {int a[4],b[4];} *p;

such implementations would squawk at an effort to access p->a[i] if i wasn't in the range 0 to 3. For programs that don't need to index the address of array-type lvalue p->a to access anything outside that array, being able to trap on such out-of-bounds accesses would be useful.

The authors of C89 were also almost certainly aware that it was common for programs to use the address of dummy-sized array at the end of a structure as a means of accessing storage beyond the structure. Using such techniques made it possible to do things that couldn't be done nearly as nicely otherwise, and part of the Spirit of C, according to the authors of the Standard, is "Don't prevent the programmer from doing what needs to be done".

Consequently, the authors of the Standard treated such accesses as something which implementations could support or not, at their leisure, presumably based upon what would be most useful for their customers. While it would often be helpful for implementations which would normally bounds-check accesses to structures in an array, to provide an option to omit such checks in cases where the last item of an indirectly-accessed structure is an array with one element (or, if they extend the language to waive a compile-time constraint, zero elements), people writing such implementations would presumably be capable of recognizing such things without the authors of the Standard having to tell them. The notion that "Undefined Behavior" was intended as some form of prohibition doesn't seem to have really taken hold until after the publication of C89's successor standard.

With regard to your example, having a pointer within a struct point to later storage in the same allocation should work, but with a couple of caveats:

If the allocation is passed to realloc, the pointer within it will become invalid.
The only real advantage of using a pointer versus a flexible array member is that it allows for the possibility of having it point somewhere else. That may be good if the only kind of "something else" will always be an constant object of static duration that never has to be freed, or perhaps if it is some other kind of object that won't have to be freed, but may be problematical if it could hold the only reference to something stored in a separate allocation.

Flexible array members have been available as an extension in some compilers before C89 was written, and were officially added in C99. Any decent compiler should support them.

回答4:

You can define struct Array as:

struct Array
{
    size_t length;
    char array[1];
}; /* +(length + 1) char */

then malloc( sizeof *a + length ). The "+1" element is in array[1] member. Fill structure with:

a->length = length;
strcpy( a->array, str );

来源：https://stackoverflow.com/questions/55014685/multiple-structures-in-a-single-malloc-invoking-undefined-behaviour

标签

language-lawyer

c89

flexible-array-member