Is it always safe to convert an integer value to void* and back again in POSIX?

问题

This question is almost a duplicate of some others I've found, but this specifically concerns POSIX, and a very common example in pthreads that I've encountered several times. I'm mostly concerned with the current state of affairs (i.e., C99 and POSIX.1-2008 or later), but any interesting historical information is of course interesting as well.

The question basically boils down to whether b will always take the same value as a in the following code:

long int a = /* some valid value */
void *ptr = (void *)a;
long int b = (long int)ptr;

I am aware that this usually works, but the question is whether it is a proper thing to do (i.e., does the C99 and/or POSIX standards guarantee that it will work).

When it comes to C99 it seems it does not, we have 6.3.2.3:

5 An integer may be converted to any pointer type. Except as previously speciﬁed, the result is implementation-deﬁned, might not be correctly aligned, might not point to an entity of the referenced type, and might be a trap representation.56)

6 Any pointer type may be converted to an integer type. Except as previously speciﬁed, the result is implementation-deﬁned. If the result cannot be represented in the integer type, the behavior is undeﬁned. The result need not be in the range of values of any integer type.

Even using intptr_t the standard seems to only guarantee that any valid void* can be converted to intptr_t and back again, but it does not guarantee that any intptr_t can be converted to void* and back again.

However it is still possible that the POSIX standard allows this.

I have no great desire to use a void* as a storage space for any variable (I find it pretty ugly even if POSIX should allow it), but I feel I have to ask because of the common example use of the pthreads_create function where the argument to start_routine is an integer, and it is passed in as void* and converted to int or long int in the start_routine function. For example this manpage has such an example (see link for full code):

//Last argument casts int to void *
pthread_create(&tid[i], NULL, sleeping, (void *)SLEEP_TIME);
/* ... */
void * sleeping(void *arg){
    //Casting void * back to int
    int sleep_time = (int)arg;
    /* ... */
}

I've also seen a similar example in a textbook (An Introduction to Parallel Programming by Peter S. Pacheco). Considering that it seems to be a common example used by people who should know this stuff much better than me, I'm wondering if I'm wrong and this is actually a safe and portable thing to be doing.

回答1:

As you say, C99 doesn't guarantee that any integer type may be converted to void* and back again without loss of information. It does make a similar guarantee for intptr_t and uintptr_t defined in <stdint.h>, but those types are optional. (The guarantee is that a void* may be converted to {u,}intptr_t and back without loss of information; there's no such guarantee for arbitrary integer values.)

POSIX doesn't appear to make any such guarantee either.

The POSIX description of <limits.h> requires int and unsigned int to be at least 32 bits. This exceeds the C99 requirement that they be at least 16 bits. (Actually, the requirements are in terms of ranges, not sizes, but the effect is that int and unsigned int must be at least 32 (under POSIX) or 16 (under C99) bits, since C99 requires a binary representation.)

The POSIX description of <stdint.h> says that intptr_t and uintptr_t must be at least 16 bits, the same requirement imposed by the C standard. Since void* can be converted to intptr_t and back again without loss of information, this implies that void* may be as small as 16 bits. Combine that with the POSIX requirement that int is at least 32 bits (and the POSIX and C requirement that long is at least 32 bits), and it's possible that a void* just isn't big enough to hold an int or long value without loss of information.

The POSIX description of pthread_create() doesn't contradict this. It merely says that arg (the void* 4th argument to pthread_create()) is passed to start_routine(). Presumably the intent is that arg points to some data that start_routine() can use. POSIX has no examples showing the usage of arg.

You can see the POSIX standard here; you have to create a free account to access it.

回答2:

The focus in answers so far seems to be on the width of a pointer, and indeed as @Nico points out (and @Quantumboredom also points out in a comment), there is a possibility that intptr_t may be wider than a pointer. @Kevin's answer hints at the other important issue, but doesn't completely describe it.

Also, though I'm not sure of the exact paragraph in the standard, Harbison & Steele point out that intptr_t and uintptr_t are optional types too and may not even exist in a valid C99 implementation. OpenGroup says that XSI-conformant systems must support both types, but that means plain POSIX therefore does does not require them (at least as of the 2003 edition).

The part that's really been missed here though is that pointers need not always have a simple numerical representation that matches the internal representation of an integer. This has always been so (since K&R 1978), and I'm pretty sure POSIX is careful not to overrule this possibility either.

So, C99 does require that it be possible to convert a pointer to an intptr_t IFF that type exists, and then back to a pointer again such that the new pointer will still point at the same object in memory as the old pointer, and indeed if pointers have a non-integer representation this implies that an algorithm exists which can convert a a specific set of integer values into valid pointers. However this also means that not all integers between INTPTR_MIN and INTPTR_MAX are necessarily valid pointer values, even if the width of intptr_t (and/or uintptr_t) is exactly the same as the width of a pointer.

So, the standards cannot guarantee that any intptr_t or uintptr_t can be converted to a pointer and back to the same integer value, or even which set of integer values can survive such conversion, because they cannot possibly define all of the possible rules and algorithms for converting integer values into pointer values. Doing so even for all known architectures could still prevent the applicability of the standard to novel types of architectures yet to be invented.

回答3:

(u)intptr_t are only guarateed to be large enough to hold a pointer, but they may also be "larger", which is why the C99 standard only guarantees (void*)->(u)intptr_t->(void*), but in the other case loss of data may occur (and is considered undefined).

回答4:

Not sure what you mean by "always". It's not written anywhere in the standard that this is okay, but there are no systems it fails on.

If your integers are really small (say limited to 16bit) you can make it strictly conforming by declaring:

static const char dummy_base[65535];

and then passing dummy_base+i as the argument and recovering it as i=(char *)start_arg-dummy_base;

回答5:

I think your answer is in the text you quoted:

If the result cannot be represented in the integer type, the behavior is undeﬁned. The result need not be in the range of values of any integer type.

So, not necessarily. Say you had a 64-bit long and cast it to a void* on a 32-bit machine. The pointer is likely 32 bits, so either you lose the top 32 bits or get INT_MAX back. Or, potentially, something else entirely (undefined, as the standard says).

来源：https://stackoverflow.com/questions/7822904/is-it-always-safe-to-convert-an-integer-value-to-void-and-back-again-in-posix

标签

pthreads

posix

c99