Why does dereferencing a pointer to string (char array) returns the whole string instead of the first character?

问题

Since the pointer to array points to the first element of the array (having the same address), I don't understand why this happens:

#include <stdio.h>

int main(void) {    
    char (*t)[] = {"test text"};
    printf("%s\n", *t + 1); // prints "est text"
}

Additionally, why does the following code print 2 then?

#include <stdio.h>

int main(void) {    
    char (*t)[] = {1, 2, 3, 4, 5};
    printf("%d\n", *t + 1); // prints "2"
}

回答1:

All other answers at the moment of writing this answer were incorrect. Moreover your question smells like an an XY problem in that the construct you were trying most probably wasn't what you wanted. What you'd really want to do is simply:

char *t = "test text";
printf("%s\n", t);  // prints "test text"

printf("%c\n", t[1]); // prints "e", the 2nd character in the string.

But since you wanted to understand why those things happen, and all the other explanations were wrong, here goes:

Your declaration declares t as a pointer to an array of char:

cdecl> explain char (*t)[];
declare t as pointer to array of char

not an array of pointers as others have suggested. Furthermore, the type of *t is incomplete, so you cannot take its size:

sizeof *t;

will result in

error: invalid application of ‘sizeof’ to incomplete type ‘char[]’
     sizeof *t;

at compile time.

Now, when you try to initialize this with

 char (*t)[] = {"test text"};

it will warn because while "test text" is a array of (constant) char, here it decays to a pointer to char. Additionally, the braces there are useless; the excerpt above is equal to writing:

char (*t)[] = "test text";

Not unlike

int a = 42;

and

int a = {42};

are synonymous. This is C.

To get a pointer to array, you must use "address-of" operator on the array (the string literal!), to avoid it decaying to a pointer:

char (*t)[] = &"test text";

Now t is a properly initialized as a pointer to an (immutable) array of char. However in your case using a pointer to incorrect type didn't matter because the 2 pointers, despite being of incompatible type, pointed to the equally same address - only, one pointed to array-of-char, and the other to the first character in that array of char; and thus the observed behaviour was identical.

When you dereference t, which was pointer-to-array-of-char, you will get an locator value (lvalue) of array-of-char. An lvalue of array-of-char will then under normal circumstances decay to a pointer-to-the-first-element, as they usually do, so *t + 1 will now point to the second character in that array; and printfing that value will then print the contents of a 0-terminated string starting from that pointer.

The behaviour of %s is specified in C11 (n1570) as

[%s]

If no l length modifier is present, the argument shall be a pointer to the initial element of an array of character type. Characters from the array are written up to (but not including) the terminating null character. [...] If the precision is not specified or is greater than the size of the array, the array shall contain a null character. [...]

(emphasis mine.)

As for your second initialization:

char (*t2)[] = {1, 2, 3, 4, 5};

if you compile this with a recent version GCC you will get lots of warnings by default, first:

test.c:10:19: warning: initialization makes pointer from integer without a cast [-Wint-conversion]
   char (*t2)[] = {1, 2, 3, 4, 5};
                   ^

Thus 1 is converted from int to a pointer-to-array-of-char without any cast.

Then, of the remaining values, the compiler will complain:

y.c:10:19: note: (near initialization for ‘t2’)
y.c:10:21: warning: excess elements in scalar initializer
   char (*t2)[] = {1, 2, 3, 4, 5};
                      ^

That is, in your case the 2, 3, 4 and 5 were silently ignored.

The value of that pointer is thus now 1, e.g. on an x86 flat memory model it would point to memory location 1 (though this is naturally implementation defined):

printf("%p\n", (void*)t2);

prints (doubly implementation defined)

0x1

When you dereference this value (which is a pointer-to-array-of-char), you will get an lvalue for array-of-char that starts at memory address 1. When you add 1, this array-of-char lvalue will decay to a pointer-to-char, and as a result you will get ((char*)1) + 1 which is a pointer-to-char whose value is 2. The type of that value can be verified from the warning generated by default by GCC (5.4.0):

y.c:5:10: warning: format ‘%d’ expects argument of type ‘int’, but argument 2 has type ‘char *’ [-Wformat=]
   printf("%d\n",*t2+1); //prints "2"
          ^

The argument is of type char *.

Now you pass a (char*)2 as an argument to printf, to be converted using %d, which expects an int. This has undefined behaviour; in your case the byte pattern of (char*)2 is sufficiently confusingly interpreted as 2 and thus it is printed.

And now one realizes that the value printed has nothing to do with 2 in the original initializer:

#include <stdio.h>

int main(void) {
    char (*t2)[] = {1, 42};
    printf("%d\n", *t2 + 1);
}

will still print 2, not 42. QED.

Alternatively for both initializations you could have used the C99 compound literals to initialize:

// Warning: this code is super *evil*
char (*t)[] = &(char []) { "test text" };
char (*t2)[] = &(char []) { 1, 2, 3, 4, 5 };

Though this would probably be even less that which you wanted, and the resulting code does not have any chance of compiling in C89 or C++ compilers.

回答2:

*t will fetch you the first element, then you add 1, and because of pointer arithmetic this means, advance one element, which explains why you get the second element.

Now in the first case you print with %s, which says print me the string (until the NULL terminator is meet), while in the second you print with %d, just a number.

If you would like to experience equivalent behavior print with %c in the first case too, which will require a cast, of course.

By the way, as already mentioned, one would usually not do:

char (*t)[] = {"test text"};

which creates an array of pointers, with the first element being the string, which should raise a warning:

C02QT2UBFVH6-lm:~ gsamaras$ gcc -Wall main.c 
main.c:4:18: warning: incompatible pointer types initializing 'char (*)[]' with an expression of type 'char [10]'
      [-Wincompatible-pointer-types]
  char (*t)[] = {"test text"};
                 ^~~~~~~~~~~

As Olaf mentioned, this:

char (*t)[] = {&"test text"};

will make the warning go away, since you are now assigning the address of the string to the pointer.

Now try to think what will this print:

include <stdio.h>

int main(void) {
  char (*t)[] = {&"test text"};
  printf("%s\n", *t + 1);
  printf("%c\n", *(*t + 1));

  return 0;
}

The first will take do what you expect, while the second needs an extra dereference, to actually get the character.

But something like this is usual:

char t[] = "test text";

or of course other approaches.

So, in that case, ask what this program will print?

#include <stdio.h>

int main(void) {
  char t[] = "test text";
  printf("%s\n", t + 1); 
  printf("%c\n", *(t + 1));
  return 0;
}

The first print() will take t, which because of dereferencing points to the first element of the array, i.e. the first character of the string, and then you add one to it, but because it's a pointer, it advances due to pointer arithmetic to the next element (because we do +1. If we did +2, it would advance 2 elements, and so on..).

Now as I explained above, %s will print the whole string, from the starting pointer of the printf()'s argument, until it reaches the NULL terminator of the string.

As a result, this will print "est text".

The second printf() follows the same exact philosophy, but its argument is preceded by the * operator, which means give me the element that is pointed, i.e. the second character of the string.

Since we use %c, it will just print that character, i.e. "e".

回答3:

In C, strings are just array of chars terminated by a \0 character. When you do:

char (*t)[] = {"test text"};

You're creating an array of pointers, and you fill in the first element with "test text", which is a pointer to a zero-terminated char array the compiler will create for you. When you dereference t you get a pointer to the string, then you add 1 which makes it point to the second character and %s will print everything up to the zero terminator.

You could also write:

char t[] = "test text";
printf("%s\n", t + 1);

Or:

char t[] = {'t', 'e', 's', 't', ' ', 't', 'e', 'x', 't', '\0'};
printf("%s\n", t + 1);

Or even, if you don't want to modify the string:

const char *t = "test text";
printf("%s\n", t + 1);

To print a single character, use %c (passing in a char, not a pointer, so it would be *(*t+1) in your code or just t[1] in my examples, which is what you're doing with %d).

来源：https://stackoverflow.com/questions/39155012/why-does-dereferencing-a-pointer-to-string-char-array-returns-the-whole-string

标签

arrays

string

pointers

pointer-arithmetic