Is it legal write to a byte array in a union and read from an int to convert values in MISRA C?

跟風遠走 提交于 2020-04-13 08:05:11


I guess this must have been asked before, but I could not get a specific yes/no answer.

I have this code snippet :

union integer_to_byte
    signed int  IntPart;
    unsigned char BytePart[2];

typedef union integer_to_byte I2B;

   I2B u16VarNo;

       // some code....
       u16VarNo.BytePart[1]= P1;

       // some more code ....
       u16VarNo.BytePart[0]= P2;

       // still more code ...
       if(u16VarNo.IntPart != 0xFFFF)

Is this a legal way to use Unions in C ? From what I read; only the last assigned Union part is valid.So "u16VarNo.BytePart[1]" is not determinate ? The code which I have written works perfectly as expected,but I thought I would get it clarified.



Is it legal write to a byte array in a union and read from an int to convert values in MISRA C?

No. Unions shall not be used.

MISRA C:2004, 18.4 - Unions shall not be used.
MISRA C:2012, 19.2 - The union keyword should not be used

The rule in MISRA C:2004 follows with:

It is recognised nonetheless that there are situations in which the careful use of unions is desirable in constructing an efficient implementation. In such situations, deviations to this rule are considered acceptable provided that all relevant implementation-defined behaviour is documented. This might be achieved in practice by referencing the implementation section of the compiler manuals from the design documentation.


The use of deviations is acceptable for (a) packing and unpacking of data, for example when sending and receiving messages, and (b) implementing variant records provided that the variants are differentiated by a common field.

Your use does not fit those cases.


Formally, unions are not allowed, though this rule has been relaxed to Advisory from MISRA-C:2004 to MISRA-C:2012. The main purpose of banning union was always to prevent really dumb things like creating "variant types" à la Visual Basic, or by re-using the same memory area for unrelated purposes.

But to use union for the purpose of type punning is common practice, particularly in embedded systems, so banning them is cumbersome too. Rule 19.2 raises the valid concern that writing to one union member and then reading from another invokes unspecified or implementation-defined behavior. Unspecified if the members don't match, otherwise implementation-defined since there are conversions.

Further concerns from MISRA regarding the breaking the rule are padding, alignment, endianess and bit order (in case of bit-fields). These are also valid concerns - in your specific example, many of these are potential issues.

My advise is this:

  • Deviate from this rule only if you know what you are doing. Valid cases of type punning through unions are register map declarations and serialization/de-serialization code.
  • Using a union for the purpose of getting high and low bytes of some int is not a valid use case, it's just bad... because it makes the code needlessly non-portable for nothing gained. Assuming 16 bit system, there's absolutely no reason why you can't replace this union with portable bit operators instead:

    int16_t some_int = ...;
    uint8_t ms = (uint16_t)some_int >> 8;
    uint8_t ls = some_int & 0xFF;
  • Ensure that padding isn't an issue by (pseudo code) _Static_assert( sizeof(the_union) == sizeof(all_members)...

  • Document any code that disables padding, both in your source code with comments and in your MISRA-C implementation document. Stuff like #pragma pack(1) or whatever your specific compiler uses.


union punning especially using the gcc family (or IAR, GHS, ARM and many other compilers) is 100 % fine.

All compilers I know follow the footnote 95.

If the member used to access the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called "type punning"). This might be a trap representation.


In ordinary C — only following the rules of ISO C, not the additional rules added by MISRA — the construct shown is conforming, but not strictly conforming, because it depends on unspecified behavior. "Unspecified" means, in this case, that the read from u16VarNo.IntPart is allowed to give you a value that doesn't make any sense at all, but it is not allowed to crash your program, and the compiler is not allowed to optimize on the assumption that the read can never be executed.

The precise rule is C2011 section paragraph 7:

When a value is stored in a member of an object of union type, the bytes of the object representation that do not correspond to that member but do correspond to other members take unspecified values.

u16VarNo.BytePart[1]= P1 stores a value in a member of an object of union type. That union has two other members, BytePart[0] and IntPart¹; both of them cover at least one byte of the object representation that don't correspond to BytePart[1] (depending on exactly how big a signed int is); that byte takes on an unspecified value when you write to BytePart[1].

The practical upshot of this is that after

u16VarNo.BytePart[1] = 0xFF;
u16VarNo.BytePart[0] = 0xFF;

you are allowed to read from uint16VarNo.IntPart but the value you get may well be garbage. In particular

assert(u16VarNo.IntPart == 0xFFFF);   // THIS ASSERTION MAY FAIL

I am only vaguely familiar with MISRA's additional rules but I have the impression that they flat-out forbid you to do anything even sort of like this.

The correct way to convert two bytes of data from an external source into a 16-bit signed integer is with helper functions like this:

#include <stdint.h>

int16_t be16_to_cpu_signed(const uint8_t data[static 2])
    uint32_t val = (((uint32_t)data[0]) << 8) | 
                   (((uint32_t)data[1]) << 0);
    return ((int32_t) val) - 0x10000u;

int16_t le16_to_cpu_signed(const uint8_t data[static 2])
    uint32_t val = (((uint32_t)data[0]) << 0) | 
                   (((uint32_t)data[1]) << 8);
    return ((int32_t) val) - 0x10000u;

There are two functions because you need to know, and specify in your code, which endianness the external source provides the data in. (This is another, unconnected reason why your original code can't be relied on.) You have to use a 32-bit unsigned intermediate because the constant 0x10000 doesn't fit in a 16-bit register. You have to include all of those explicit casts to stdint.h fixed-width types because otherwise the "usual arithmetic conversions" will make the math be done on int values, which is wrong for this code.

¹ Whether or not BytePart[0] and BytePart[1] are two separate members of the union is ill-specified; this is an instance of the "what exactly is an 'object'" argument that has been unresolved since the original publication of the 1989 C standard, despite multiple attempts to fix the wording. However, it's not safe to assume compilers won't treat them as two separate objects.