Why does this for loop exit on some platforms and not on others?

前端 未结 14 2731
执念已碎
执念已碎 2020-12-12 09:00

I have recently started to learn C and I am taking a class with C as the subject. I\'m currently playing around with loops and I\'m running into some odd behaviour which I d

相关标签:
14条回答
  • 2020-12-12 09:35

    In what should be the last run of the loop,you write to array[10], but there are only 10 elements in the array, numbered 0 through 9. The C language specification says that this is “undefined behavior”. What this means in practice is that your program will attempt to write to the int-sized piece of memory that lies immediately after array in memory. What happens then depends on what does, in fact, lie there, and this depends not only on the operating system but more so on the compiler, on the compiler options (such as optimization settings), on the processor architecture, on the surrounding code, etc. It could even vary from execution to execution, e.g. due to address space randomization (probably not on this toy example, but it does happen in real life). Some possibilities include:

    • The location wasn't used. The loop terminates normally.
    • The location was used for something which happened to have the value 0. The loop terminates normally.
    • The location contained the function's return address. The loop terminates normally, but then the program crashes because it tries to jump to the address 0.
    • The location contains the variable i. The loop never terminates because i restarts at 0.
    • The location contains some other variable. The loop terminates normally, but then “interesting” things happen.
    • The location is an invalid memory address, e.g. because array is right at the end of a virtual memory page and the next page isn't mapped.
    • Demons fly out of your nose. Fortunately most computers lack the requisite hardware.

    What you observed on Windows was that the compiler decided to place the variable i immediately after the array in memory, so array[10] = 0 ended up assigning to i. On Ubuntu and CentOS, the compiler didn't place i there. Almost all C implementations do group local variables in memory, on a memory stack, with one major exception: some local variables can be placed entirely in registers. Even if the variable is on the stack, the order of variables is determined by the compiler, and it may depend not only on the order in the source file but also on their types (to avoid wasting memory to alignment constraints that would leave holes), on their names, on some hash value used in a compiler's internal data structure, etc.

    If you want to find out what your compiler decided to do, you can tell it to show you the assembler code. Oh, and learn to decipher assembler (it's easier than writing it). With GCC (and some other compilers, especially in the Unix world), pass the option -S to produce assembler code instead of a binary. For example, here's the assembler snippet for the loop from compiling with GCC on amd64 with the optimization option -O0 (no optimization), with comments added manually:

    .L3:
        movl    -52(%rbp), %eax           ; load i to register eax
        cltq
        movl    $0, -48(%rbp,%rax,4)      ; set array[i] to 0
        movl    $.LC0, %edi
        call    puts                      ; printf of a constant string was optimized to puts
        addl    $1, -52(%rbp)             ; add 1 to i
    .L2:
        cmpl    $10, -52(%rbp)            ; compare i to 10
        jle     .L3
    

    Here the variable i is 52 bytes below the top of the stack, while the array starts 48 bytes below the top of the stack. So this compiler happens to have placed i just before the array; you'd overwrite i if you happened to write to array[-1]. If you change array[i]=0 to array[9-i]=0, you'll get an infinite loop on this particular platform with these particular compiler options.

    Now let's compile your program with gcc -O1.

        movl    $11, %ebx
    .L3:
        movl    $.LC0, %edi
        call    puts
        subl    $1, %ebx
        jne     .L3
    

    That's shorter! The compiler has not only declined to allocate a stack location for i — it's only ever stored in the register ebx — but it hasn't bothered to allocate any memory for array, or to generate code to set its elements, because it noticed that none of the elements are ever used.

    To make this example more telling, let's ensure that the array assignments are performed by providing the compiler with something it isn't able to optimize away. An easy way to do that is to use the array from another file — because of separate compilation, the compiler doesn't know what happens in another file (unless it optimizes at link time, which gcc -O0 or gcc -O1 doesn't). Create a source file use_array.c containing

    void use_array(int *array) {}
    

    and change your source code to

    #include <stdio.h>
    void use_array(int *array);
    
    int main()
    {
      int array[10],i;
    
      for (i = 0; i <=10 ; i++)
      {
        array[i]=0; /*code should never terminate*/
        printf("test \n");
    
      }
      printf("%zd \n", sizeof(array)/sizeof(int));
      use_array(array);
      return 0;
    }
    

    Compile with

    gcc -c use_array.c
    gcc -O1 -S -o with_use_array1.c with_use_array.c use_array.o
    

    This time the assembler code looks like this:

        movq    %rsp, %rbx
        leaq    44(%rsp), %rbp
    .L3:
        movl    $0, (%rbx)
        movl    $.LC0, %edi
        call    puts
        addq    $4, %rbx
        cmpq    %rbp, %rbx
        jne     .L3
    

    Now the array is on the stack, 44 bytes from the top. What about i? It doesn't appear anywhere! But the loop counter is kept in the register rbx. It's not exactly i, but the address of the array[i]. The compiler has decided that since the value of i was never used directly, there was no point in performing arithmetic to calculate where to store 0 during each run of the loop. Instead that address is the loop variable, and the arithmetic to determine the boundaries was performed partly at compile time (multiply 11 iterations by 4 bytes per array element to get 44) and partly at run time but once and for all before the loop starts (perform a subtraction to get the initial value).

    Even on this very simple example, we've seen how changing compiler options (turn on optimization) or changing something minor (array[i] to array[9-i]) or even changing something apparently unrelated (adding the call to use_array) can make a significant difference to what the executable program generated by the compiler does. Compiler optimizations can do a lot of things that may appear unintuitive on programs that invoke undefined behavior. That's why undefined behavior is left completely undefined. When you deviate ever so slightly from the tracks, in real-world programs, it can be very hard to understand the relationship between what the code does and what it should have done, even for experienced programmers.

    0 讨论(0)
  • 2020-12-12 09:36

    You declare int array[10] means array has index 0 to 9 (total 10 integer elements it can hold). But the following loop,

    for (i = 0; i <=10 ; i++)
    

    will loop 0 to 10 means 11 time. Hence when i = 10 it will overflow the buffer and cause Undefined Behavior.

    So try this:

    for (i = 0; i < 10 ; i++)
    

    or,

    for (i = 0; i <= 9 ; i++)
    
    0 讨论(0)
  • 2020-12-12 09:39

    Well, C compiler traditionally does not check for bounds. You can get a segmentation fault in case you refer to a location that does not "belong" to your process. However, the local variables are allocated on stack and depending on the way the memory is allocated, the area just beyond the array (array[10]) may belong to the process' memory segment. Thus, no segmentation fault trap is thrown and that is what you seem to experience. As others have pointed out, this is undefined behavior in C and your code may be considered erratic. Since you are learning C, you are better off getting into the habit of checking for bounds in your code.

    0 讨论(0)
  • 2020-12-12 09:40

    You have a bounds violation, and on the non-terminating platforms, I believe you are inadvertently setting i to zero at the end of the loop, so that it starts over again.

    array[10] is invalid; it contains 10 elements, array[0] through array[9], and array[10] is the 11th. Your loop should be written to stop before 10, as follows:

    for (i = 0; i < 10; i++)
    

    Where array[10] lands is implementation-defined, and amusingly, on two of your platforms, it lands on i, which those platforms apparently lay out directly after array. i is set to zero and the loop continues forever. For your other platforms, i may be located before array, or array may have some padding after it.

    0 讨论(0)
  • 2020-12-12 09:41

    It is undefined at array[10], and gives undefined behavior as described before. Think about it like this:

    I have 10 items in my grocery cart. They are:

    0: A box of cereal
    1: Bread
    2: Milk
    3: Pie
    4: Eggs
    5: Cake
    6: A 2 liter of soda
    7: Salad
    8: Burgers
    9: Ice cream

    cart[10] is undefined, and may give an out of bounds exception in some compilers. But, a lot apparently don't. The apparent 11th item is an item not actually in the cart. The 11th item is pointing to, what I'm going to call, a "poltergeist item." It never existed, but it was there.

    Why some compilers give i an index of array[10] or array[11] or even array[-1] is because of your initialization/declaration statement. Some compilers interpret this as:

    • "Allocate 10 blocks of ints for array[10] and another int block. to make it easier, put them right next to each other."
    • Same as before, but move it a space or two away, so that array[10] doesn't point to i.
    • Do the same as before, but allocate i at array[-1] (because an index of an array can't, or shouldn't, be negative), or allocate it at a completely different spot because the OS can handle it, and it's safer.

    Some compilers want things to go quicker, and some compilers prefer safety. It's all about the context. If I was developing an app for the ancient BREW OS (the OS of a basic phone), for example, it wouldn't care about safety. If I was developing for an iPhone 6, then it could run fast no matter what, so I would need an emphasis on safety. (Seriously, have you read Apple's App Store Guidelines, or read up on the development of Swift and Swift 2.0?)

    0 讨论(0)
  • 2020-12-12 09:41

    Since you created an array of size 10, for loop condition should be as follows:

    int array[10],i;
    
    for (i = 0; i <10 ; i++)
    {
    

    Currently you are trying to access the unassigned location from the memory using array[10] and it is causing the undefined behavior. Undefined behavior means your program will behave undetermined fashion, so it can give different outputs in each execution.

    0 讨论(0)
提交回复
热议问题