Do C and C++ standards imply that a special value in the address space must exist solely to represent the value of null pointers?

问题

Following discussion from this question about null pointers in C and C++, I'd like to have the ending question separated here.

If it can be inferred from C and C++ standards (answers can target both standards) that dereferencing a pointer variable whose value is equal to the nullptr (or (void *)0) value is undefined behavior, does it imply that these languages require that a special value in the address space is dead, meaning that it's unusable except for the role of representing nullptr? What if the system has a really useful function or data structure at the same address that's equal to nullptr? Should this never happen because it's a compiler's writer responsibility to figure out a non-conflicting null pointer value for each system the compiler compiles to? Or should the programmer that needs to access such function or data structure be content while programming in "undefined behavior mode" to achieve its intents?

This looks like blurring the lines of the roles of a compiler and a computer system. I would ask whether it's right to do so, but I guess there's no room for this here.

This blog post digs about tackling the problem situation

回答1:

does it imply that these languages require that a special value in the address space is dead, meaning that it's unusable except for the role of representing nullptr?

No.

The compiler needs a special value to represent a null pointer, and must take care that it does not place any object or function at that address, because all pointers to objects and functions are required to compare unequal to the null pointer. The standard library must take similar precautions in its implementation of malloc and friends.

However, if there is something at that address already, something that no strictly conforming program can access, then an implementation is allowed to support dereferencing the null pointer to access it. Dereferencing the null pointer is undefined in standard C, so an implementation can make it do anything it likes, including the obvious.

Both the C and the C++ standards understand the concept of the as-if rule, which basically means that if to valid input, an implementation is indistinguishable from one that conforms to the standard, then it does conform to the standard. The C standard uses a trivial example:

5.1.2.3 Program execution

10 EXAMPLE 2 In executing the fragment
char c1, c2;
/* ... */
c1 = c1 + c2;
the "integer promotions" require that the abstract machine promote the value of each variable to int size and then add the two ints and truncate the sum. Provided the addition of two chars can be done without overflow, or with overflow wrapping silently to produce the correct result, the actual execution need only produce the same result, possibly omitting the promotions.

Now, if c1 and c2's values come from registers, and it's possible to force values outside of char's range into those registers (e.g. by inline assembly), then the fact that the implementation optimises away the integer promotions might be observable. However, since the only way to observe it is through undefined behaviour or implementation extensions, there is no way for any standard code to be affected by this, and an implementation is allowed to do it.

This is the same logic that applies to getting useful results when dereferencing null pointers: there are only two ways to see, from code, that there is something meaningful at that particular address: getting a null pointer from an evaluation that is guaranteed to produce a pointer to an object, or by just trying it. The former is what I mentioned the compiler and standard library must take care of. The latter is not something that can affect a valid standard program.

A well-known example is the interrupt vector table on DOS implementations, which resides at address zero. It is typically accessed simply by dereferencing a null pointer. The C and C++ standards don't, shouldn't and cannot cover access to the interrupt vector table. They do not define such behaviour, but they do not restrict access to it either. Implementations should be and are allowed to provide extensions to access it.

回答2:

That depends on what is meant by the phrase "address space". The C standard uses the phrase informally, but doesn't define what it means.

For each pointer type, there must be a value (the null pointer) that compares unequal to a pointer to any object or function. That means, for example, that if a pointer type is 32 bits wide, then there can be at most 2³²-1 valid non-null values of that type. There could be fewer than that if some addresses have more than one representation, or if not all representations correspond to valid addresses.

So if you define the "address space" to cover 2^N distinct addresses, where N is the width in bits of a pointer, then yes, one of those values must be reserved as the null pointer value.

On the other hand, if the "address space" is narrower than that (for example, typical 64-bit systems can't actually access 2⁶⁴ distinct memory locations), then the value reserved as the null pointer can easily be outside the "address space".

Some things to note:

The representation of a null pointer may or may not be all-bits-zero.
Not all pointer types are necessarily the same size.
Not all pointer types necessarily use the same representation for a null pointer.

On most modern implementations, all pointer types are the same size, and all represent a null pointer as all-bits-zero, but there are valid reasons to, for example, make function pointers wider than object pointers, or make void* wider than int*, or use a representation other than all-bits-zero for the null pointer.

This answer is based on the C standard. Most of it also applies to C++. (One difference is that C++ has pointer-to-member types, which are typically wider than ordinary pointers.)

回答3:

does it imply that these languages require that a special value in the address space is dead, meaning that it's unusable except for the role of representing nullptr?

Yes.

C has requirements for null pointer that make it different to object pointers:

(C11, 6.3.2.3p3) "[...] If a null pointer constant is converted to a pointer type, the resulting pointer, called a null pointer, is guaranteed to compare unequal to a pointer to any object or function."

What if the system has a really useful function or data structure at the same address that's equal to nullptr? Should this never happen because it's a compiler writer responsibility to figure out a non-conflicting null pointer value for each system the compiler compiles to?

The New C Standard by Derek M. Jones provides the following commentary on implementations:

All bits zero is a convenient execution-time representation of the null pointer constant for many implementations because it is invariably the lowest address in storage. (The INMOS Transputer[632] had a signed address space, which placed zero in the middle.) Although there may be program bootstrap information at this location, it is unlikely that any objects or functions will be placed here. Many operating systems leave this storage location unused because experience has shown that program faults sometimes cause values to be written into the location specified by the null pointer constant (the more developer-oriented environments try to raise an exception when that location is accessed).

Another implementation technique, when the host environment does not include address zero as part of a processes address space, is to create an object (sometimes called _ _null) as part of the standard library. All references to the null pointer constant refer to this object, whose address will compare unequal to any other object or function.

回答4:

Yes, that's precisely what it means.

[C++11: 4.10/1]: [..] A null pointer constant can be converted to a pointer type; the result is the null pointer value of that type and is distinguishable from every other value of object pointer or function pointer type. [..]

The null pointer value doesn't need to be 0x00000000, but it does need to be unique; there's no other way to make this rule work.

It's certainly not the only rule of the abstract machine that implicitly emplaces strict limitations upon practical implementations.

What if the OS puts a really useful function or data structure at the same address that's equal to nullptr?

The OS won't do that but it can be exploited.

来源：https://stackoverflow.com/questions/28574069/do-c-and-c-standards-imply-that-a-special-value-in-the-address-space-must-exis

标签

c++

language-lawyer

systems-programming