Order of memory allocation in C

问题

I am trying to understand how the computer/OS/compiler (not sure who owns memory allocation, hence my noob-ish questioon) assigns memory addresses to local variables.

I have this simple program:

#include <stdio.h>

int main(int argc, char** argv) {

    printf("hello, world\n");
    int arr[10];
    int a = 1;
    int b = 2;
    int c;
    for (int i = 0; i < 10; i++) {

        printf("Variable i: %p\n", &i);
        printf("Variable arr[i]: %p\n", &arr[i]);
    }
    printf("Variable a: %p\n", &a);
    printf("Variable b: %p\n", &b);
    printf("Variable c: %p\n", &c);
}

There are two main things I dont understand.

Why does variable i get an earlier memory address then variable arr, and variable a/b even earlier than? It appears it has something to do when you actually use the variable or assign it a value.
How/Why does the OS (or whoever is responsible) use the same memory address for variable c, and variable i? Obviously i goes out of scope, but c was declared before.

Here is the output from the program:

hello, world
Variable i: 0x7ffd60b1696c
Variable arr[i]: 0x7ffd60b16970
Variable i: 0x7ffd60b1696c
Variable arr[i]: 0x7ffd60b16974
Variable i: 0x7ffd60b1696c
Variable arr[i]: 0x7ffd60b16978
Variable i: 0x7ffd60b1696c
Variable arr[i]: 0x7ffd60b1697c
Variable i: 0x7ffd60b1696c
Variable arr[i]: 0x7ffd60b16980
Variable i: 0x7ffd60b1696c
Variable arr[i]: 0x7ffd60b16984
Variable i: 0x7ffd60b1696c
Variable arr[i]: 0x7ffd60b16988
Variable i: 0x7ffd60b1696c
Variable arr[i]: 0x7ffd60b1698c
Variable i: 0x7ffd60b1696c
Variable arr[i]: 0x7ffd60b16990
Variable i: 0x7ffd60b1696c
Variable arr[i]: 0x7ffd60b16994
Variable a: 0x7ffd60b16964
Variable b: 0x7ffd60b16968
Variable c: 0x7ffd60b1696c

I am running on Ubuntu 18, gcc c99 7.4.0 compiler.

回答1:

Modern compilers typically do not assign memory to objects using any simple method. Suppose you were given several varied objects and told to store them on a shelf efficiently. You likely would not just put each object on a shelf in the same order you got them. You would probably stack similar objects (if they were stackable), and otherwise arrange objects to use space efficiently. Compilers do the same thing.

Suppose a compiler is going to assign memory to all the objects defined in a function. Rather than just read the function and assign memory as soon as it sees each definition, a compiler may read the entire function and remember information about all the definitions. Then it may organize all the objects of the same sizes together and then sort the objects by sizes.

One reason it does this is that computers often have alignment requirements or benefits. Objects that are four bytes wide often must be located at memory addresses that are multiples of four bytes. (One reason for this is that the connections between the processor and memory, and connections within the processor, are four bytes wide—they effectively use 32 wires to carry 32 bits. Moving 32 bits from place to place is easy, but shifting the bits in units of less than 32 bits requires additional devices inside the processor.) Since your question does not involve objects of different widths, I will not go into this aspect further.

Since the compiler is reading the entire function, it has to remember all the objects you define. In your example, it includes arr, a, b, and c. To do this, the compiler uses some data structure to remember them. One of the first data structures you will learn about is a simple list. The compiler could keep a list of defined objects and names. It could keep the list in the order the compiler sees the names—arr, a, b, c—or it could keep the list in alphabetical order—a, arr, b, c. Or it might keep the list in order by size or other features, perhaps a, b, c, arr if sorted by size.

However, it turns out simple lists are inefficient. If we try keeping a list in alphabetical order, then elements have to be moved every time we want to put a new name in the middle. Even a list that is just kept in the order we see the names, so that new names are just added to the end, not requiring any movement, is troublesome when we want to do fancier things with the data, like sorting the list by alignment requirements or size.

So compilers use fancier data structures for managing this information. As the compiler sees definitions, it enters the names into its data structures, which may use a variety of methods for organizing the data. Later, when the compiler is allocating memory for all the objects, the order in which they are processed is a result of how the data structure organized them. It is not a clear or simple result of how the names appear in your source code.

So, in general, there is no reason to expect that a compiler will allocate memory in an order related to the order in which names appear in your source code.

More than this, in most functions, the compiler does not assign fixed memory to many objects at all. A compiler might hold a variable only in a processor register, not the memory, or it might use different memory for the variable at different times during the execution of the function. In your example, the compiler has to assign memory for the objects because you take their addresses. In code that did not take the addresses of these variables, the compiler likely would not store them in memory at all—the function is so simple, the processor could get the work done using just processor registers, or even optimizing the code during compilation to remove some of it.

回答2:

C doesn't specify any of this stuff. The entire question concerns the internal details of some specific compiler on some specific platform.

What the standard does say is essentially just that distinct objects (variables etc.) must have distinct addresses. How those addresses are allocated, or even what addresses really are: these are implementation details.

Why does variable i get an earlier memory address then variable arr

Because that's what the compiler decided to do. It could have chosen the reverse order, or put them in entirely different storage areas if it wanted. The compiler could choose to reverse the order on odd-numbered days if it wants. Nothing is specified at all by the language, much less guaranteed.

It appears it has something to do when you actually use the variable or assign it a value.

A good optimiser may well choose to do this because it minimises the amount of storage used for locals. But (stop me if this sounds familiar) it's an implementation detail. It could change with different compiler flags or, well, anything.

回答3:

The compiler determines the layout of the variables in the executable. The actual addresses are determined by the operating system.

Why does variable i get an earlier memory address then variable arr, and variable a/b even earlier than? It appears it has something to do when you actually use the variable or assign it a value.

Possibly optimization or possibly just the way arrays are allocated on the stack by default. It doesn't affect program execution to change the layout of the variables.

How/Why does the OS (or whoever is responsible) use the same memory address for variable c, and variable i? Obviously i goes out of scope, but c was declared before.

The variable c isn't used, so the behavior of the program doesn't depend on the address of i and c being different. If you assign c a value, the address will probably change.

来源：https://stackoverflow.com/questions/58148075/order-of-memory-allocation-in-c

标签

memory