Understanding sizeof(char) in 32 bit C compilers

后端 未结 8 2324
情歌与酒
情歌与酒 2020-12-17 19:18

(sizeof) char always returns 1 in 32 bit GCC compiler.

But since the basic block size in 32 bit compiler is 4, How does char occup

相关标签:
8条回答
  • 2020-12-17 19:28

    sizeof(char) is always 1. Always. The 'block size' you're talking about is just the native word size of the machine - usually the size that will result in most efficient operation. Your computer can still address each byte individually - that's what the sizeof operator is telling you about. When you do sizeof(int), it returns 4 to tell you that an int is 4 bytes on your machine. Likewise, your structure is 8 bytes long. There is no information from sizeof about how many bits there are in a byte.

    The reason your structure is 8 bytes long rather than 5 (as you might expect), is that the compiler is adding padding to the structure in order to keep everything nicely aligned to that native word length, again for greater efficiency. Most compilers give you the option to pack a structure, either with a #pragma directive or some other compiler extension, in which case you can force your structure to take minimum size, regardless of your machine's word length.

    char is size 1, since that's the smallest access size your computer can handle - for most machines an 8-bit value. The sizeof operator gives you the size of all other quantities in units of how many char objects would be the same size as whatever you asked about. The padding (see link below) is added by the compiler to your data structure for performance reasons, so it is larger in practice than you might think from just looking at the structure definition.

    There is a wikipedia article called Data structure alignment which has a good explanation and examples.

    0 讨论(0)
  • 2020-12-17 19:30

    All object sizes in C and C++ are defined in terms of bytes, not bits. A byte is the smallest addressable unit of memory on the computer. A bit is a single binary digit, a 0 or a 1.

    On most computers, a byte is 8 bits (so a byte can store values from 0 to 256), although computers exist with other byte sizes.

    A memory address identifies a byte, even on 32-bit machines. Addresses N and N+1 point to two subsequent bytes.

    An int, which is typically 32 bits covers 4 bytes, meaning that 4 different memory addresses exist that each point to part of the int.

    In a 32-bit machine, all the 32 actually means is that the CPU is designed to work efficiently with 32-bit values, and that an address is 32 bits long. It doesn't mean that memory can only be addressed in blocks of 32 bits.

    The CPU can still address individual bytes, which is useful when dealing with chars, for example.

    As for your example:

    struct st 
    {
    int a;
    char c;
    };
    

    sizeof(st) returns 8 not because all structs have a size divisible by 4, but because of alignment. For the CPU to efficiently read an integer, its must be located on an address that is divisible by the size of the integer (4 bytes). So an int can be placed on address 8, 12 or 16, but not on address 11.

    A char only requires its address to be divisible by the size of a char (1), so it can be placed on any address.

    So in theory, the compiler could have given your struct a size of 5 bytes... Except that this wouldn't work if you created an array of st objects.

    In an array, each object is placed immediately after the previous one, with no padding. So if the first object in the array is placed at an address divisible by 4, then the next object would be placed at a 5 bytes higher address, which would not be divisible by 4, and so the second struct in the array would not be properly aligned.

    To solve this, the compiler inserts padding inside the struct, so its size becomes a multiple of its alignment requirement.

    Not because it is impossible to create objects that don't have a size that is a multiple of 4, but because one of the members of your st struct requires 4-byte alignment, and so every time the compiler places an int in memory, it has to make sure it is placed at an address that is divisible by 4.

    If you create a struct of two chars, it won't get a size of 4. It will usually get a size of 2, because when it contains only chars, the object can be placed at any address, and so alignment is not an issue.

    0 讨论(0)
  • 2020-12-17 19:42

    It is structure alignment. c uses 1 byte, 3 bytes are non used. More here (with pictures!)

    0 讨论(0)
  • 2020-12-17 19:44

    Sample code demonstrating structure alignment:

    struct st 
    {
    int a;
    char c;
    };
    
    struct stb
    {
    int a;
    char c;
    char d;
    char e;
    char f;
    };
    
    struct stc
    {
    int a;
    char c;
    char d;
    char e;
    char f;
    char g;
    };
    
    std::cout<<sizeof(st) << std::endl; //8
    std::cout<<sizeof(stb)  << std::endl; //8
    std::cout<<sizeof(stc)  << std::endl; //12
    

    The size of the struct is bigger than the sum of its individual components, since it was set to be divisible by 4 bytes by the 32 bit compiler. These results may be different on different compilers, especially if they are on a 64 bit compiler.

    0 讨论(0)
  • 2020-12-17 19:45

    Sizeof returns the value in bytes. You were talking about bits. 32 bit architectures are word aligned and byte referenced. It is irrelevant how the architecture stores a char, but to compiler, you must reference chars 1 byte at a time, even if they use up less than 1 byte.

    This is why sizeof(char) is 1.

    ints are 32 bit, hence sizeof(int)= 4, doubles are 64 bit, hence sizeof(double) = 8, etc.

    0 讨论(0)
  • 2020-12-17 19:48

    It works the same way as using half a piece of paper. You use one part for a char and the other part for something else. The compiler will hide this from you since loading and storing a char into a 32bit processor register depends on the processor.

    Some processors have instructions to load and store only parts of the 32bit others have to use binary operations to extract the value of a char.

    Addressing a char works as it is AFAIR by definition the smallest addressable memory. On a 32bit system pointers to two different ints will be at least 4 address points apart, char addresses will be only 1 apart.

    0 讨论(0)
提交回复
热议问题