Difference between dereferencing pointer and accessing array elements

此生再无相见时 提交于 2019-11-27 19:25:42
bolov

Thanks to the link provided by @tesseract in the comments: Expert C Programming: Deep C Secrets (page 96), I came up with a simple answer (a simple dumb down version of the explanation in the book; for a full academic answer read the book):

  • when declared int a[2]:
    • the compiler has for a an address where this variable is stored. This address is also the address of the array since the type of the variable is array.
    • Accessing a[1] means:
      • retrieving that address,
      • adding the offset and
      • accessing the memory at this computed new address.
  • when declared int *b:
    • the compiler also has an address for b but this is the address of the pointer variable, not the array.
    • So accessing b[1] means:
      • retrieving that address,
      • accessing that location to get the value of b, i.e. the address of the array
      • adding an offset to this address and then
      • accessing the final memory location.
// in file2.c

extern int *b; // b is declared as a pointer to an integer

// in file1.c

int b[2] = {100, 101}; // b is defined and initialized as an array of 2 ints

The linker links them both to same memory address, however since the symbol b has different types in file1.c and file2.c, the same memory location is interpreted differently.

// in file2.c

int x2;  // assuming sizeof(int) == 4
x2 = b[1]; // b[1] == *(b+1) == *(100 + 1) == *(104) --> segfault

b[1] is evaluated first as *(b+1). This means get the value at the memory location b is bound to, add 1 to it (pointer arithmetic) to get a new address, load that value into the CPU register, store that value at the location x2 is bound to. So, the value at the location b is bound to is 100, add 1 to it to get 104 (pointer arithmetic; sizeof *b is 4) and get the value at the address 104! This is wrong and undefined behaviour and most likely will cause program crash.

There is a difference in how the elements of an array are accessed and how the values pointed to by a pointer are accessed. Let's take an example.

int a[] = {100, 800};
int *b = a;

a is an array of 2 integers and b is a pointer to an integer initialized to the address of the first element of a. Now when a[1] is accessed, it means get whatever is there at offset 1 from the address of a[0], the address (and the next block) to which the symbol a is bound. That's one assembly instruction. It's as if some information is embedded into the array symbol so that the CPU can fetch an element at an offset from the base address of the array in one instruction. When you access *b or b[0] or b[1], you first get the content of b which is an address, then do the pointer arithmetic to get a new address and then get whatever is there at that address. So the CPU has to first load the content of b, evaluate b+1 (for b[1]) and then load the content at address b+1. That's two assembly instructions.

For an extern array, you don't need to specify its size.The only requirement is that it must match with its external definition. Therefore both the following statements are equivalent:

extern int a[2];  // equivalent to the below statement
extern int a[];

You must match the type of the variable in its declaration with its external definition. The linker doesn't check for types of variables when resolving references of symbols. Only functions have the types of the function encoded into the function name. Therefore you won't get any warning or error and it would compile just fine.

Technically, the linker or some compiler component could track what type the symbol represents, and then give an error or warning. But there is no requirement from the standard to do so. You are required to do the right thing.

This doesn't fully answer your question, but it gives you a hint to what is going on. Modify your code a little bit to give

//file1.c

int a[2] = {800, 801};
int b[2] = {255, 255};

#include <stdio.h>

extern int a[2];

// here b is declared as pointer,
// although the external unit declares it as an array

extern int *b; 
int *c;

int main() {

  int x1, x2;

  x1 = a[1]; // ok
  c = b;
  printf("allocated x1 OK\n");
  printf("a is %p\n", a);
  printf("c is %p\n", c);
  x2 = *(c+1);
  printf("%d %d\n", x1, x2);
  return 0;
}

Now, when you run it, you still get a segfault. But just before you do, you get an insight into why:

allocated x1 OK
a is 0x10afa4018
c is 0xff000000ff
Segmentation fault: 11

The value of pointer c is not what you expect: instead of being the pointer to the start of array b (which would be a sensible memory location close to a), it seems to contain the contents of array b... (0xff is 255 in hex, of course).

I can't explain very clearly why that is - for that, see the link that was given by @tesseract in the comments (really all of chapter 4 is extremely useful).

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!