Is struct packing deterministic?

后端 未结 8 2050
北荒
北荒 2020-12-13 18:00

For example, say I have two equivalent structs a and b in different projects:

typedef struct _a
{
    int a;
    double b;
    char         


        
相关标签:
8条回答
  • 2020-12-13 18:03

    The compiler is deterministic; if it weren't, separate compilation would be impossible. Two different translation units with the same struct declaration will work together; that is guaranteed by §6.2.7/1: Compatible types and composite types.

    Moreover, two different compilers on the same platform should interoperate, although this is not guaranteed by the standard. (It's a quality of implementation issue.) To allow inter-operability, compiler writers agree on a platform ABI (Application Binary Interface) which will include a precise specification of how composite types are represented. In this way, it is possible for a program compiled with one compiler to use library modules compiled with a different compiler.

    But you are not just interested in determinism; you also want the layout of two different types to be the same.

    According to the standard, two struct types are compatible if their members (taken in order) are compatible, and if their tags and member names are the same. Since your example structs have different tags and names, they are not compatible even though their member types are, so you cannot use one where the other is required.

    It may seem odd that the standard allows tags and member names to affect compatibility. The standard requires that the members of a struct be laid out in declaration order, so names cannot change the order of members within the struct. Why, then, could they affect padding? I don't know of any compiler where they do, but the standard's flexibility is based on the principle that the requirements should be the minimum necessary to guarantee correct execution. Aliasing differently tagged structs is not permitted within a translation unit, so there is no need to condone it between different translation units. And so the standard does not allow it. (It would be legitimate for an implementation to insert information about the type in a struct's padding bytes, even if it needed to deterministically add padding to provide space for such information. The only restriction is that padding cannot be placed before the first member of a struct.)

    A platform ABI is likely to specify the layout of a composite type without reference to its tag or member names. On a particular platform, with a platform ABI which has such a specification and a compiler documented to conform to the platform ABI, you could get away with the aliasing, although it would not be technically correct, and obviously the preconditions make it non-portable.

    0 讨论(0)
  • 2020-12-13 18:05

    The C standard itself says nothing about it, so in line of principle you just cannot be sure.

    But: most probably your compiler adheres to some particular ABI, otherwise communicating with other libraries and with the operating system would be a nightmare. In this last case, the ABI will usually prescribe exactly how packing works.

    For example:

    • on x86_64 Linux/BSD, the SystemV AMD64 ABI is the reference. Here (§3.1) for every primitive processor data type it is detailed the correspondence with the C type, its size and its alignment requirement, and it's explained how to use this data to make up the memory layout of bitfields, structs and unions; everything (besides the actual content of the padding) is specified and deterministic. The same holds for many other architectures, see these links.

    • ARM recommends its EABI for its processors, and it's generally followed by both Linux and Windows; the aggregates alignment is specified in "Procedure Call Standard for the ARM Architecture Documentation", §4.3.

    • on Windows there's no cross-vendor standard, but VC++ essentially dictates the ABI, to which virtually any compiler adhere; it can be found here for x86_64, here for ARM (but for the part of interest of this question it just refers to the ARM EABI).

    0 讨论(0)
  • 2020-12-13 18:08

    Any sane compiler will produce identical memory layout for the two structs. Compilers are usually written as perfectly deterministic programs. Non-determinism would need to be added explicitly and deliberately, and I for one fail to see the benefit of doing so.

    However, that does not allow you to cast a struct _a* to a struct _b* and access its data via both. Afaik, this would still be a violation of strict aliasing rules even if the memory layout is identical, as it would allow the compiler to reorder accesses via the struct _a* with accesses via the struct _b*, which would result in unpredictable, undefined behavior.

    0 讨论(0)
  • 2020-12-13 18:14

    Yes. You should always assume deterministic behaviour from your compiler.

    [EDIT] From the comments below, it is obvious there are many Java programmers reading the question above. Let's be clear: C structs do not generate any name, hash, or the likes in object files, libraries, or dlls. The C function signatures do not refer to them either. Which means, the member names can be changed at whim - really! - provided the type and order of the member variables is the same. In C, the two structures in the example are equivalent, since packing does not change. which means that the following abuse is perfectly valid in C, and there's certainly much worse abuse to be found in some of the most widely-used libraries.

    [EDIT2] No one should ever dare to do any of the following in C++

    /* the 3 structures below are 100% binary compatible */
    typedef struct _a { int a; double b; char c; }
    typedef struct _b { int d; double e; char f; }
    typedef struct SOME_STRUCT { int my_i; double my_f; char my_c[1]; }
    
    struct _a a = { 1, 2.5, 'z' };
    struct _b b;
    
    /* the following is valid, copy b -> a  */
    *(SOME_STRUCT*)&a = *(SOME_STRUCT*)b;
    assert((SOME_STRUCT*)&a)->my_c[0] == b.f);
    assert(a.c == b.f);
    
    /* more generally these identities are always true. */
    assert(sizeof(a) == sizeof(b));
    assert(memcmp(&a, &b, sizeof(a)) == 0);
    assert(pure_function_requiring_a(&a) == pure_function_requiring_a((_a*)&b));
    assert(pure_function_requiring_b((b*)&a) == pure_function_requiring_b(&b));
    
    function_requiring_a_SOME_STRUCT_pointer(&a);  /* may generate a warning, but not all compiler will */
    /* etc... the name space abuse is limited to the programmer's imagination */
    
    0 讨论(0)
  • 2020-12-13 18:20

    Any particular compiler ought to be deterministic, but between any two compilers, or even the same compiler with different compilation options, or even between different versions of the same compiler, all bets are off.

    You're much better off if you don't depend on the details of the structure, or if you do, you should embed code to check at runtime that the structure is actually as you depend.

    A good example of this is the recent change from 32 to 64 bit architectures, where even if you didn't change the size of integers used in a structure, the default packing of partial integers changed; where previously 3 32bit integers in a row would pack perfectly, now they pack into two 64 bit slots.

    You can't possibly anticipate what changes may occur in the future; if you depend on details that are not guaranteed by the language, such as structure packing, you ought to verify your assumptions at runtime.

    0 讨论(0)
  • 2020-12-13 18:23

    ISO C says that two struct types in different translation units are compatible if they have the same tag and members. More precisely, here is the exact text from the C99 standard:

    6.2.7 Compatible type and composite type

    Two types have compatible type if their types are the same. Additional rules for determining whether two types are compatible are described in 6.7.2 for type specifiers, in 6.7.3 for type qualifiers, and in 6.7.5 for declarators. Moreover, two structure, union, or enumerated types declared in separate translation units are compatible if their tags and members satisfy the following requirements: If one is declared with a tag, the other shall be declared with the same tag. If both are complete types, then the following additional requirements apply: there shall be a one-to-one correspondence between their members such that each pair of corresponding members are declared with compatible types, and such that if one member of a corresponding pair is declared with a name, the other member is declared with the same name. For two structures, corresponding members shall be declared in the same order. For two structures or unions, corresponding bit-fields shall have the same widths. For two enumerations, corresponding members shall have the same values.

    It seems very strange if we interpret it from the point of view of, "what, the tag or member names could affect padding?" But basically the rules are simply as strict as they can possibly be while allowing the common case: multiple translation units sharing the exact text of a struct declaration via a header file. If programs follow looser rules, they aren't wrong; they are just not relying on requirements for behavior from the standard, but from elsewhere.

    In your example, you are running afoul of the language rules, by having only structural equivalence, but not equivalent tag and member names. In practice, this is not actually enforced; struct types with different tags and member names in different translation units are de facto physically compatible anyway. All sorts of technology depends on this, such as bindings from non-C languages to C libraries.

    If both your projects are in C (or C++), it would probably be worth the effort to try to put the definition into a common header.

    It's also a good idea to put in some defense against versioning issues, such as a size field:

    // Widely shared definition between projects affecting interop!
    // Do not change any of the members.
    // Add new ones only at the end!
    typedef struct a
    {
        size_t size; // of whole structure
        int a;
        double b;
        char c;
    } a;
    

    The idea is that whoever constructs an instance of a must initialize the size field to sizeof (a). Then when the object is passed to another software component (perhaps from the other project), it can check the size against its sizeof (a). If the size field is smaller, then it knows that the software which constructed a is using an old declaration with fewer members. Therefore, the nonexistent members must not be accessed.

    0 讨论(0)
提交回复
热议问题