C standard regarding pointer arithmetic outside arrays

时光毁灭记忆、已成空白 提交于 2021-01-27 04:38:52

问题


I read lot of things about pointer arithmetic and undefined behavior (link, link, link, link, link). It always ends up to the same conclusion: Pointer arithmetic is well defined only on array type and between array[0] and array[array_size+1] (one element past the end is valid with regard to the C standard).

My question is: Does it means that when the compiler sees a pointer arithmetic not related to any array (undefined behavior), it could emit what it want (even nothing) ? Or is it more a high level "undefined behavior" meaning you could reach unmapped memory, garbage data, etc and there is not guarantee about the address validity?

In this example:

char test[10];
char * ptr = &test[0];
printf("test[-1] : %d", *(ptr-1))

By "undefined behavior", is it just that the value is not guarantee at all (could be garbage, unmapped memory, etc) but we can still say with certainty that we are accessing the memory address contiguous to the array 8 bytes before the start? Or is it "undefined behavior" in a way that the compiler can just not emit this code at all?

Another simple use case: You want to compute the in-memory size of one function. One naïve implementation could be the following code assuming that the functions are outputted in the binary in the same order, are contiguous and without any padding in between.

#include <stdint.h>
#include <stdio.h>

void func1()
{}

void func2()
{}

int main()
{
  uint8_t * ptr1 = (uint8_t*) &func1;
  uint8_t * ptr2 = (uint8_t*) &func2;

  printf("Func 1 size : %ld", ptr2-ptr1);

  return 0;
}

Since ptr1 and ptr2 are not part of an array, it is considered as undefined behavior. Again, does it means the compiler could not emit those code? Or does "undefined behavior" means that the subtraction is meaningless depending on the system (functions not contiguous in memory, with padding, etc) but still occurs as expected? Is there any well defined way to compute the subtraction between two unrelated pointers?


回答1:


The C standard doesn't define degrees of undefinedness for undefined behavior. If it's undefined, it's always all bets are off.

Additionally, modern compilers mess with this pointer provenance thing where the compiler even watches if a possibly valid pointer is derived correctly and if it isn't, it can adjust program behavior.

If you want mathematical pointer arithmetic without the possibility of UB, you can try and cast your pointer to uintptr_t prior to doing the math.


E.g.:

#include <stdio.h>
int main()
{
    char a,b;
    printf("&a=%p\n", &a);
    printf("&b=%p\n", &b);
    printf("&a+1=%p\n", &a+1);
    printf("&b+1=%p\n", &b+1);
    printf("%d\n", &a+1==&b || &b+1==&a);
}

on my machine, compiled with gcc -O2, results in:

&a=0x7ffee4e36cae
&b=0x7ffee4e36caf
&a+1=0x7ffee4e36caf
&b+1=0x7ffee4e36cb0
0

I.e., &a+1 has the same numerical address as &b but is treated as unequal to &b because the addresses are derived from different objects.

(This gcc optimization is somewhat controversial. It doesn't carry across function call / translation unit boundaries, clang doesn't do it, and it's not necessary as 6.5.9p6 does allow for accidental pointer equality. See dbush's to this Keith Thompson's answer for more details.)




回答2:


The C standard has to say undefined behavior simply because things like memory mapping is beyond the scope of the standard.

This doesn't apply only to array indexing being the only allowed form of pointer arithmetic, but also the C concept of "effective type", which can be described as the compiler's internal list of what types that are actually stored at any given address it knows about. And accessing parts of memory which the compiler doesn't know about is essentially undefined behavior too.

If you look at the average embedded system, you frequently need to access addresses where there are no arrays, and as far as the compiler knows, no objects at all (memory-mapped registers etc). Therefore all such embedded C compilers have guarantees that such code behaves predictably, even though such guarantees are "non-standard extensions". Which in practice means that pointers boil down to integer numbers representing physical addresses.

Best practice is to write code that is safe no matter. For example, if we are to write a program that dumps the contents of a flash memory page, we want to iterate over it byte by byte (to drop the result on some serial bus). With the average embedded systems compiler, it is safe to simply set a volatile const uint8_t* to the first byte of the flash page, then iterate away, regardless of what variables and types that happen to be stored there. But from C's point of view, this is undefined behavior.

We can sate both the requirements from C and the real world by placing all variables to be allocated in that page inside one enormous struct foo { ... } bar;. Which we are allowed to iterate over byte by byte using a pointer to a character type like uint8_t. (C17 6.3.2.3/7).

So the effort of dodging undefined behavior isn't necessarily that cumbersome. There's often work-arounds with structs, unions, converting pointers to integers etc etc.




回答3:


Actually, to prove that any arbitrary pointer arithmetic is "unrelated to any array" is very hard (maybe similar to the Halting Problem? Not sure) because a pointer can be assigned "sneakily", through global variable, pointer to pointer, looking at the map file to find the pointer's actual address and modify that etc.

What the standard is saying is that the compiler will probably do the "expected things" in terms of generated code (i.e. usual pointer arithmetic), but that the resulting pointer is not guaranteed to point to anything valid. So the behavior is "undefined". In particular, if you declare a variable before and after an array, and if your pointer goes even one element before or after the array, you are not guarantee that you will be touching those variables or in fact, any valid memory. On a system with with memory protection, it may even crash. The actual behavior depends on the system running the code.




回答4:


The C Standards Committee saw no need to forbid compilers from behaving in silly ways that would make them unsuitable for many purposes. Indeed, according to the published Rationale, the Committee recognized that it would be possible for implementation to behave in a way that was conforming but useless, but judged that people seeking to produce quality implementations of the language the Standard was written to describe would refrain from such silliness. Consider the program:

void byte_copy(unsigned char *dest, unsigned char *src, int len)
{
  while(len--) *dest++ = *src++;
}
unsigned char src[10][10], dest[100];
void test(int mode)
{
  if (mode == 0)
    byte_copy(dest, src[0], 11);
  else
    byte_copy(dest, (unsigned char*)src, 100);
}

It might be useful for an implementation to trap on test if mode is zero, on the basis that the programmer was probably intending to copy elements from the first row of src, and the authors of the Standard probably didn't want to forbid that. On the other hand, the language would be severely broken if code like that in the mode != 0 case couldn't be used to make a bytewise copy of objects of all types including multi-dimensional arrays, and the Committee likely recognized that. Nonetheless, the Standard recognizes no distinction between the pointers passed in the two cases.

Such a distinction would only be necessary if one believed that the language would be broken by allowing implementations to behave in ways that would make them useless. Since the authors of the Standard have said that they recognize that it allows implementations to behave uselessly, but do not believe that such a possibility breaks the language, that would suggest that they might not regard a failure to define behavior of all necessary constructs as a defect in cases where they expected that quality implementations of the language the Standard was written to describe would support such constructs anyway.

As to the question of whether people seeking to write quality implementations of the language the Standard was written to describe can be relied upon to refrain from such silliness, that may be hard to answer without knowing the motives of the people maintaining some compilers.



来源:https://stackoverflow.com/questions/56360316/c-standard-regarding-pointer-arithmetic-outside-arrays

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!