Is the array to pointer decay changed to a pointer object?

问题

int a[] = {1, 2 ,3};

I understand that array names are converted to pointers. A term often used is that they decay to pointers.

However to me, a pointer is a region of memory that holds the address to another region of memory, so:

int *p = a;

can be drawn like this:

-----              -----
  p    --------->  a[0].  .....
-----              -----
 0x1                0x9

But a itself is not pointing to another region of memory, it IS the region of memory itself. So when the compiler converts it to a pointer, does it save it (like p) somewhere in memory or it's an implicit conversion?

回答1:

"But a itself is not pointing to another region of memory, it IS the region of memory itself.

"So when the compiler converts it to a pointer, does it save it (like p) somewhere in memory or it's an implicit conversion?"

It is an implicit conversion. The compiler does not implement the creation of a separate pointer object in memory (which you can f.e. assign in any manner with a different memory address) to hold the address of the first element.

The standard states (emphasize mine):

"Except when it is the operand of the sizeof operator, or the unary & operator, or is a string literal used to initialize an array, an expression that has type "array of type" is converted to an expression with type "pointer to type" that points to the initial element of the array object and is not an lvalue. If the array object has register storage class, the behavior is undefined."

Source: ISO/IEC 9899:2018 (C18), 6.3.2.1/4

The array is converted to an expression of pointer type, it is not an lvalue.

The compiler just evaluates a to &a[0] (pointer to a[0]).

"I understand that array names are converted to pointers."

An array does not always convert to a pointer to its first element. Look at the first part of the quote above. F.e. when used as &a, a does not decay to a pointer to its first element. Rather it gains a pointer to the whole array int (*)[3].

回答2:

C has objects and values.

A value is an abstract concept—it is some meaning, often mathematical. Numbers have values like 4, 19.5, or −3. Addresses have values that are locations in memory. Structures have values that are the values of their members considered as an aggregate.

Values can be used in expressions, such as 3 + 4*5. When values are used in expressions, they do not have any memory locations in the computing model that C uses. This includes values that are addresses, such as &x in &x + 3.

Objects are regions of memory whose contents can represent values. The declaration int *p = &x defines p to be an object. Memory is reserved for it, and it is assigned the value &x.

For an array declared with int a[10], a is an object; it is all the memory reserved for 10 int elements.

When a is used in an expression, other than as the operand of sizeof or unary &, the a used in the expression is automatically converted to the address of its first element, &a[0]. This is a value. No memory is reserved for it; it is not an object. It may be used in expressions as a value without any memory ever being reserved for it. Note that the actual a is not converted in any way; when we say a is converted to a pointer, we mean only that an address is produced for use in the expression.

All of the above describes semantics in the computing model C uses, which is that of some abstract computer. In practice, when a compiler works with expressions, it often uses processor registers to manipulate the values in those expressions. Processor registers are a form of memory (they are things in a device that retain values), but they are not the “main memory” we often mean when we speak of “memory” without qualification. However, a compiler may also not have the values in any memory at all because it calculates the expression in part or in full during compilation, so the expression that is actually computed when the program is executing might not include all the values that are nominally in the expression as it is written in C. And a compiler might also have the values in main memory because computing a complicated expression might overflow what is feasible in the processor registers, so that parts of the expression have to be temporarily stored in main memory (often on a hardware stack).

回答3:

But a itself is not pointing to another region of memory, it IS the region of memory itself. So when the compiler converts it to a pointer, does it save it (like p) somewhere in memory or it's an implicit conversion?

Logically speaking, it's an implicit conversion - there's no requirement that the implementation materialize permanent storage for the pointer.

In terms of implementation, it's up to the compiler. For example, here's a simplistic bit of code that creates an array and prints its address:

#include <stdio.h>

int main( void )
{
  int arr[] = { 1, 2, 3 };
  printf( "%p", (void *) arr );
  return 0;
}

When I use gcc to compile it for x86-64 on a Red Hat system, I get the following machine code:

GAS LISTING /tmp/ccKF3mdz.s             page 1


   1                    .file   "arr.c"
   2                    .text
   3                    .section    .rodata
   4                .LC0:
   5 0000 257000        .string "%p"
   6                    .text
   7                    .globl  main
   9                main:
  10                .LFB0:
  11                    .cfi_startproc
  12 0000 55            pushq   %rbp
  13                    .cfi_def_cfa_offset 16
  14                    .cfi_offset 6, -16
  15 0001 4889E5        movq    %rsp, %rbp
  16                    .cfi_def_cfa_register 6
  17 0004 4883EC10      subq    $16, %rsp
  18 0008 C745F401      movl    $1, -12(%rbp)
  18      000000
  19 000f C745F802      movl    $2, -8(%rbp)
  19      000000
  20 0016 C745FC03      movl    $3, -4(%rbp)
  20      000000
  21 001d 488D45F4      leaq    -12(%rbp), %rax
  22 0021 4889C6        movq    %rax, %rsi
  23 0024 BF000000      movl    $.LC0, %edi
  23      00
  24 0029 B8000000      movl    $0, %eax
  24      00
  25 002e E8000000      call    printf
  25      00
  26 0033 B8000000      movl    $0, %eax
  26      00
  27 0038 C9            leave
  28                    .cfi_def_cfa 7, 8
  29 0039 C3            ret
  30                    .cfi_endproc
  31                .LFE0:
  33                    .ident  "GCC: (GNU) 7.3.1 20180712 (Red Hat 7.3.1-6)"
  34                    .section    .note.GNU-stack,"",@progbits

Line 17 allocates space for the array by subtracting 16 from the stack pointer (yes, there are only 3 elements in the array, which should only require 12 bytes - I'll let someone with more familiarity with the x86_64 architecture explain why, 'cause I'll get it wrong).

Lines 18, 19, and 20 initialize the contents of the array. Note that there's no arr variable in the machine code - it's all done in terms of an offset from the current frame pointer.

Line 21 is where the conversion occurs - we load the effective address of the first element of the array (which is the address stored in the %rbp register minus 12) into the %rax register. That value (along with the address of the format string) then gets passed to printf. Note that the results of this conversion aren't stored anywhere other than the register, so it will be lost the next time something writes to %rax - IOW, no permanent storage has been set aside for it the same way storage has been set aside for the array contents.

Again, that's how gcc in Red Hat running on x86-64 does it. A different compiler on a different architecture will do it differently.

回答4:

Here's what the 2011 ISO C Standard says (6.3.2.1p3):

Except when it is the operand of the sizeof operator, or the unary & operator, or is a string literal used to initialize an array, an expression that has type “array of type” is converted to an expression with type “pointer to type” that points to the initial element of the array object and is not an lvalue. If the array object has register storage class, the behavior is undefined.

The standard uses the word "converted" here, but it's not the usual kind of conversion.

Normally, a conversion (either an implicit conversion, or an explicit conversion specified by a cast operator) takes an expression of some type as its operand, and yields a result of the target type. The result is determined by the value of the operand. In most or all cases, you could write a function that does the same thing. (Note that both implicit and explicit conversions perform the same operation; the fact that array-to-pointer conversion is implicit isn't particularly relevant.)

In the case of the array-to-pointer conversion described above, that's not the case. The value of an array object consists of the values of its elements -- and that value contains no information about the address at which the array is stored.

It probably would have been clearer to refer to this as an adjustment rather than a conversion. The standard uses the word "adjusted" to refer to the compile-time transformation of a parameter of array type to a parameter of pointer type. For example, this:

void func(int notReallyAnArray[42]);

really means this:

void func(int *notReallyAnArray);

The "conversion" of an array expression to a pointer expression is a similar kind of thing.

On the other hand, the word "conversion" doesn't only mean type conversions. For example, the standard uses the word "conversion" when discussing printf format strings ("%d" and "%s" are conversion specifications).

Once you understand that the "conversion" being described is really a compile-time adjustment, converting one kind of expression to another kind of expression (not value), it's much less confusing.

DIGRESSION:

One interesting thing about the standard's description of array-to-pointer conversion is that it talks about an expression of array type, but the behavior depends on the existence of "the array object". An expression of a non-array type doesn't necessarily have an object associated with it (i.e., it's not necessarily an lvalue). But every array expression is an lvalue. And in one case (the name of an array member of non-value union or structure expression, particularly when a function returns a structure value), the language had to be updated to guarantee that that's always the case, and the concept of temporary lifetime had to be introduced in the 2011 standard. The semantics of referring to the name of an array member of a structure returned by a function call were not at all clear in the 1990 and 1999 standards.

来源：https://stackoverflow.com/questions/62345429/is-the-array-to-pointer-decay-changed-to-a-pointer-object

标签

arrays

pointers

memory

language-lawyer