Why is the size of an enum
always 2 or 4 bytes (on a 16- or 32-bit architecture respectively), regardless of the number of enumerators in the type?
Does
It seems to me that the OP has assumed that an enum is some kind of collection which stores the values declared in it. This is incorrect.
An enumeration in C/C++ is simply a numeric variable with strictly defined value range. The names of the enum are kind of aliases for numbers.
The storage size is not influenced by the amount of the values in enumeration. The storage size is implementation defined, but mostly it is the sizeof(int)
.
The size of an enum is implementation-defined -- the compiler is allowed to choose whatever size it wants, as long as it's large enough to fit all of the values. Some compilers choose to use 4-byte enums for all enum types, while some compilers will choose the smallest type (e.g. 1, 2, or 4 bytes) which can fit the enum values. The C and C++ language standards allow both of these behaviors.
From C99 §6.7.2.2/4:
Each enumerated type shall be compatible with
char
, a signed integer type, or an unsigned integer type. The choice of type is implementation-defined,110) but shall be capable of representing the values of all the members of the enumeration.
From C++03 §7.2/5:
The underlying type of an enumeration is an integral type that can represent all the enumerator values defined in the enumeration. It is implementation-defined which integral type is used as the underlying type for an enumeration except that the underlying type shall not be larger than
int
unless the value of an enumerator cannot fit in anint
orunsigned int
. If the enumerator-list is empty, the underlying type is as if the enumeration had a single enumerator with value 0. The value ofsizeof()
applied to an enumeration type, an object of enumeration type, or an enumerator, is the value ofsizeof()
applied to the underlying type.
In both C and C++, the size of an enum
type is implementation-defined, and is the same as the size of some integer type.
A common approach is to make all enum
types the same size as int
, simply because that's typically the type that makes for the most efficient access. Making it a single byte, for example, would save a very minor amount of space, but could require bigger and slower code to access it, depending on the CPU architecture.
In C, enumeration constants are by definition of type int
. So given:
enum foo { zero, one, two };
enum foo obj;
the expression zero
is of type int
, but obj
is of type enum foo
, which may or may not have the same size as int
. Given that the constants are of type int
, it tends to be easier to make the enumerated type the same size.
In C++, the rules are different; the constants are of the enumerated type. But again, it often makes the most sense for each enum
type to be one "word", which is typically the size of int
, for efficiency reasons.
And the 2011 ISO C++ standard added the ability to specify the underlying integer type for an enum
type. For example, you can now write:
enum foo: unsigned char { zero, one, two };
which guarantees that both the type foo
and the constants zero
, one
, and two
have a size of 1 byte. C does not have this feature, and it's not supported by older pre-2011 C++ compilers (unless they provide it as a language extension).
(Digression follows.)
So what if you have an enumeration constant too big to fit in an int
? You don't need 231, or even 215, distinct constants to do this:
#include <limits.h>
enum huge { big = INT_MAX, bigger };
The value of big
is INT_MAX
, which is typically 231-1, but can be as small as 215-1 (32767). The value of bigger
is implicitly big + 1
.
In C++, this is ok; the compiler will simply choose an underlying type for huge
that's big enough to hold the value INT_MAX + 1
. (Assuming there is such a type; if int
is 64 bits and there's no integer type bigger than that, that won't be possible.)
In C, since enumeration constants are of type int
, the above is invalid. It violates the constraint stated in N1570 6.7.2.2p2:
The expression that defines the value of an enumeration constant shall be an integer constant expression that has a value representable as an int.
and so a compiler must reject it, or at least warn about it. gcc, for example, says:
error: overflow in enumeration values
The size of an enum
is "an integral type at least large enough to contain any of the values specified in the declaration". Many compilers will just use an int
(possibly unsigned
), but some will use a char
or short
, depending on optimization or other factors. An enum
with less than 128 possible values would fit in a char
(256 for unsigned char
), and you would have to have 32768 (or 65536) values to overflow a short
, and either 2 or 4 billion values to outgrow an int
on most modern systems.
An enum
is essentially just a better way of defining a bunch of different constants. Instead of this:
#define FIRST 0
#define SECOND 1
...
you just:
enum myenum
{ FIRST,
SECOND,
...
};
It helps avoid assigning duplicate values by mistake, and removes your need to even care what the particular values are (unless you really need to).
An enum is not a structure, it's just a way of giving names to a set of integers. The size of a variable with this type is just the size of the underlying integer type.
The big problem with making an enum
type smaller than int
when a smaller type could fit all the values is that it would make the ABI for a translation unit dependent on the number of enumeration constants. For instance, suppose you have a library that uses an enum
type with 256 constants as part of its public interface, and the compiler chooses to represent the type as a single byte. Now suppose you add a new feature to the library and now need 257 constants. The compiler would have to switch to a new size/representation, and now all object files compiled for the old interface would be incompatible with your updated library; you would have to recompile everything to make it work again.
Thus, any sane implementation always uses int
for enum
types.