问题
int a[] = {1, 2 ,3};
I understand that array names are converted to pointers. A term often used is that they decay to pointers.
However to me, a pointer
is a region of memory that holds the address to another region of memory, so:
int *p = a;
can be drawn like this:
----- -----
p ---------> a[0]. .....
----- -----
0x1 0x9
But a
itself is not pointing to another region of memory, it IS the region of memory itself.
So when the compiler converts it to a pointer, does it save it (like p
) somewhere in memory or
it's an implicit conversion?
回答1:
"But
a
itself is not pointing to another region of memory, it IS the region of memory itself."So when the compiler converts it to a pointer, does it save it (like
p
) somewhere in memory or it's an implicit conversion?"
It is an implicit conversion. The compiler does not implement the creation of a separate pointer object in memory (which you can f.e. assign in any manner with a different memory address) to hold the address of the first element.
The standard states (emphasize mine):
"Except when it is the operand of the sizeof operator, or the unary & operator, or is a string literal used to initialize an array, an expression that has type "array of type" is converted to an expression with type "pointer to type" that points to the initial element of the array object and is not an lvalue. If the array object has register storage class, the behavior is undefined."
Source: ISO/IEC 9899:2018 (C18), 6.3.2.1/4
The array is converted to an expression of pointer type, it is not an lvalue
.
The compiler just evaluates a
to &a[0]
(pointer to a[0]
).
"I understand that array names are converted to pointers."
An array does not always convert to a pointer to its first element. Look at the first part of the quote above. F.e. when used as &a
, a
does not decay to a pointer to its first element. Rather it gains a pointer to the whole array int (*)[3]
.
回答2:
C has objects and values.
A value is an abstract concept—it is some meaning, often mathematical. Numbers have values like 4, 19.5, or −3. Addresses have values that are locations in memory. Structures have values that are the values of their members considered as an aggregate.
Values can be used in expressions, such as 3 + 4*5
. When values are used in expressions, they do not have any memory locations in the computing model that C uses. This includes values that are addresses, such as &x
in &x + 3
.
Objects are regions of memory whose contents can represent values. The declaration int *p = &x
defines p
to be an object. Memory is reserved for it, and it is assigned the value &x
.
For an array declared with int a[10]
, a
is an object; it is all the memory reserved for 10 int
elements.
When a
is used in an expression, other than as the operand of sizeof
or unary &
, the a
used in the expression is automatically converted to the address of its first element, &a[0]
. This is a value. No memory is reserved for it; it is not an object. It may be used in expressions as a value without any memory ever being reserved for it. Note that the actual a
is not converted in any way; when we say a
is converted to a pointer, we mean only that an address is produced for use in the expression.
All of the above describes semantics in the computing model C uses, which is that of some abstract computer. In practice, when a compiler works with expressions, it often uses processor registers to manipulate the values in those expressions. Processor registers are a form of memory (they are things in a device that retain values), but they are not the “main memory” we often mean when we speak of “memory” without qualification. However, a compiler may also not have the values in any memory at all because it calculates the expression in part or in full during compilation, so the expression that is actually computed when the program is executing might not include all the values that are nominally in the expression as it is written in C. And a compiler might also have the values in main memory because computing a complicated expression might overflow what is feasible in the processor registers, so that parts of the expression have to be temporarily stored in main memory (often on a hardware stack).
回答3:
But a itself is not pointing to another region of memory, it IS the region of memory itself. So when the compiler converts it to a pointer, does it save it (like p) somewhere in memory or it's an implicit conversion?
Logically speaking, it's an implicit conversion - there's no requirement that the implementation materialize permanent storage for the pointer.
In terms of implementation, it's up to the compiler. For example, here's a simplistic bit of code that creates an array and prints its address:
#include <stdio.h>
int main( void )
{
int arr[] = { 1, 2, 3 };
printf( "%p", (void *) arr );
return 0;
}
When I use gcc
to compile it for x86-64 on a Red Hat system, I get the following machine code:
GAS LISTING /tmp/ccKF3mdz.s page 1
1 .file "arr.c"
2 .text
3 .section .rodata
4 .LC0:
5 0000 257000 .string "%p"
6 .text
7 .globl main
9 main:
10 .LFB0:
11 .cfi_startproc
12 0000 55 pushq %rbp
13 .cfi_def_cfa_offset 16
14 .cfi_offset 6, -16
15 0001 4889E5 movq %rsp, %rbp
16 .cfi_def_cfa_register 6
17 0004 4883EC10 subq $16, %rsp
18 0008 C745F401 movl $1, -12(%rbp)
18 000000
19 000f C745F802 movl $2, -8(%rbp)
19 000000
20 0016 C745FC03 movl $3, -4(%rbp)
20 000000
21 001d 488D45F4 leaq -12(%rbp), %rax
22 0021 4889C6 movq %rax, %rsi
23 0024 BF000000 movl $.LC0, %edi
23 00
24 0029 B8000000 movl $0, %eax
24 00
25 002e E8000000 call printf
25 00
26 0033 B8000000 movl $0, %eax
26 00
27 0038 C9 leave
28 .cfi_def_cfa 7, 8
29 0039 C3 ret
30 .cfi_endproc
31 .LFE0:
33 .ident "GCC: (GNU) 7.3.1 20180712 (Red Hat 7.3.1-6)"
34 .section .note.GNU-stack,"",@progbits
Line 17 allocates space for the array by subtracting 16 from the stack pointer (yes, there are only 3 elements in the array, which should only require 12 bytes - I'll let someone with more familiarity with the x86_64 architecture explain why, 'cause I'll get it wrong).
Lines 18, 19, and 20 initialize the contents of the array. Note that there's no arr
variable in the machine code - it's all done in terms of an offset from the current frame pointer.
Line 21 is where the conversion occurs - we load the effective address of the first element of the array (which is the address stored in the %rbp
register minus 12) into the %rax
register. That value (along with the address of the format string) then gets passed to printf
. Note that the results of this conversion aren't stored anywhere other than the register, so it will be lost the next time something writes to %rax
- IOW, no permanent storage has been set aside for it the same way storage has been set aside for the array contents.
Again, that's how gcc
in Red Hat running on x86-64 does it. A different compiler on a different architecture will do it differently.
回答4:
Here's what the 2011 ISO C Standard says (6.3.2.1p3):
Except when it is the operand of the
sizeof
operator, or the unary&
operator, or is a string literal used to initialize an array, an expression that has type “array of type” is converted to an expression with type “pointer to type” that points to the initial element of the array object and is not an lvalue. If the array object has register storage class, the behavior is undefined.
The standard uses the word "converted" here, but it's not the usual kind of conversion.
Normally, a conversion (either an implicit conversion, or an explicit conversion specified by a cast operator) takes an expression of some type as its operand, and yields a result of the target type. The result is determined by the value of the operand. In most or all cases, you could write a function that does the same thing. (Note that both implicit and explicit conversions perform the same operation; the fact that array-to-pointer conversion is implicit isn't particularly relevant.)
In the case of the array-to-pointer conversion described above, that's not the case. The value of an array object consists of the values of its elements -- and that value contains no information about the address at which the array is stored.
It probably would have been clearer to refer to this as an adjustment rather than a conversion. The standard uses the word "adjusted" to refer to the compile-time transformation of a parameter of array type to a parameter of pointer type. For example, this:
void func(int notReallyAnArray[42]);
really means this:
void func(int *notReallyAnArray);
The "conversion" of an array expression to a pointer expression is a similar kind of thing.
On the other hand, the word "conversion" doesn't only mean type conversions. For example, the standard uses the word "conversion" when discussing printf
format strings ("%d"
and "%s"
are conversion specifications).
Once you understand that the "conversion" being described is really a compile-time adjustment, converting one kind of expression to another kind of expression (not value), it's much less confusing.
DIGRESSION:
One interesting thing about the standard's description of array-to-pointer conversion is that it talks about an expression of array type, but the behavior depends on the existence of "the array object". An expression of a non-array type doesn't necessarily have an object associated with it (i.e., it's not necessarily an lvalue). But every array expression is an lvalue. And in one case (the name of an array member of non-value union or structure expression, particularly when a function returns a structure value), the language had to be updated to guarantee that that's always the case, and the concept of temporary lifetime had to be introduced in the 2011 standard. The semantics of referring to the name of an array member of a structure returned by a function call were not at all clear in the 1990 and 1999 standards.
来源:https://stackoverflow.com/questions/62345429/is-the-array-to-pointer-decay-changed-to-a-pointer-object