问题

I wonder if C++ implementations are allowed to represent pointers to different types differently. For instance, if we had 4-byte sized/aligned int and 8-byte sized/aligned long, would it be possible to represent pointers-to-int/long as object addresses shifted right by 2/3 bits, respectively? This would effectively forbid to convert a pointer-to-long into a pointer-to-int.

I am asking because of [expr.reinterpret.cast/7]:

An object pointer can be explicitly converted to an object pointer of a different type. When a prvalue v of object pointer type is converted to the object pointer type “pointer to cv T”, the result is static_cast<cv T*>(static_cast<cv void*>(v)).

[Note 7: Converting a pointer of type “pointer to T1” that points to an object of type T1 to the type “pointer to T2” (where T2 is an object type and the alignment requirements of T2 are no stricter than those of T1) and back to its original type yields the original pointer value. — end note]

The first sentence suggests that we can convert pointers to any two object types. However, the empathized text in the (not normative) Note 7 then says that the alignment plays some role here as well. (That's why I came up with that int-long example above.)

回答1:

Yep

As a concrete example, there is a C++ implementation where pointers to single-byte elements are larger than pointers to multi-byte elements, because the hardware uses word (not byte) addressing. To emulate byte pointers, C++ uses a hardware pointer plus an extra byte offset.

void* stores that extra offset, but int* does not. Converting int* to char* works (as it must under the standard), but char* to int* loses that offset (which your note implicitly permits).

The Cray T90 supercomputer is an example of such hardware.

I will see if I can find the standards argument why this is valid thing for a compliant C++ compiler to do; I am only aware someone did it, not that it is legal to do it, but that note rather implies it is intended to be legal.

The rules are going to be in the to-from void pointer casting rules. The paragraph you quoted implicitly forwards the meaning of the conversion to there.

7.6.1.9 Static cast [expr.static.cast]

A prvalue of type “pointer to cv1 void” can be converted to a prvalue of type “pointer to cv2 T”, where T is an object type and cv2 is the same cv-qualification as, or greater cv-qualification than, cv1. If the original pointer value represents the address A of a byte in memory and A does not satisfy the alignment requirement of T, then the resulting pointer value is unspecified. Otherwise, if the original pointer value points to an object a, and there is an object b of type T (ignoring cv-qualification) that is pointer-interconvertible with a, the result is a pointer to b. Otherwise, the pointer value is unchanged by the conversion.

This demonstrates that converting to more-aligned types generates an unspecified pointer, but converting to equal-or-less aligned types that aren't actually there does not change the pointer value.

Which is permission to make a cast from a pointer to 4 byte aligned data converted to a pointer to 8 byte aligned data result in garbage.

Every object unrelated pointer cast needs to logically round-trip through a void* however.

An object pointer can be explicitly converted to an object pointer of a different type. When a prvalue v of object pointer type is converted to the object pointer type “pointer to cv T”, the result is static_cast<cv T*>(static_cast<cv void*>(v)).

(From the OP)

That covers void* to T*; I have yet to find the T* to void* conversion text to make this a complete language-lawyer level answer.

回答2:

The answer is yes. Simply because as the standard does not forbid it, an implementation could decide to have different representations for pointers to different types, or even different possible representations for a same pointer.

As most architecture now use flat addressing (meaning that the representation of the pointer is just the address), there are no good reason to do that. But I can still remember the old segment:offset address representation of the 8086 systems, that used to allow 16 bits systems to process 20 bits addresses (1024k). It used a 16 bit segment address (shifted by 4 bits to get a real address), and an offset of 16 bits for far pointers, or only 16 bits (relative to the current segment) for near addresses. In this mode, far pointers had a bunch of possible representations. BTW, far addressing was the default (so what was produced by normal source) in the large and compact mode (ref).

回答3:

Pointers represent a memory location. You could potentially save some bits to store its type. However, this will be limited and you will also limit your address space. For example, 32 bit systems are limited to about 4GB (see here).

Shifting the bits would potentially be possible, but again, how many different types would you be able to represent with that? You wouldn't be able to tell what type it is from the pointer alone, so the code will still need to know what it is dealing with. Also, the byte shift represents a totally different memory address which we often do not have much control over, unless using a custom allocator. Undoing the shift before access tackles this last point, but again, your code still needs to know what type it is to make that possible, so there isn't much advantage.

You can convert pointers to different types. But that does not mean that address or the data pointed by it will be converted. Some types require more complex pointer access (virtuals for example). If you cast an int32 to a uint32 you can expect the data in there to be somewhat predictable, both take 32 bytes of memory and store a natural number. However, for the int type, we have one bit reserved to represent negative numbers, while the unsigned type uses that bit to maximize the space of positive numbers. Note that in this case, this difference is part of the data, not the pointer.

C++ safeguards you from doing conversions between types that do not match, but it still allows you to do anything with your pointers. Whether that makes sense, is partially up to you, the types, the compiler implementation and the specification.

来源：https://stackoverflow.com/questions/66102053/can-pointers-to-different-types-have-different-binary-representations

标签

c++