How does pointer comparison work in C? Is it ok to compare pointers that don't point to the same array?

倾然丶 夕夏残阳落幕 提交于 2019-12-30 08:05:42

问题


In K&R (The C Programming Language 2nd Edition) chapter 5 I read the following:

First, pointers may be compared under certain circumstances. If p and q point to members of the same array, then relations like ==, !=, <, >=, etc. work properly.

Which seems to imply that only pointers pointing to the same array can be compared.

However when I tried this code

    char t = 't';
    char *pt = &t;
    char x = 'x';
    char *px = &x;

    printf("%d\n", pt > px);

1 is printed to the screen.

First of all, I thought I would get undefined or some type or error, because pt and px aren't pointing to the same array (at least in my understanding).

Also is pt > px because both pointers are pointing to variables stored on the stack, and the stack grows down, so the memory address of t is greater than that of x? Which is why pt > px is true?

I get more confused when malloc is brought in. Also in K&R in chapter 8.7 the following is written:

There is still one assumption, however, that pointers to different blocks returned by sbrk can be meaningfully compared. This is not guaranteed by the standard which permits pointer comparisons only within an array. Thus this version of malloc is portable only among machines for which the general pointer comparison is meaningful.

I had no issue comparing pointers that pointed to space malloced on the heap to pointers that pointed to stack variables.

For example, the following code worked fine, with 1 being printed:

    char t = 't';
    char *pt = &t;
    char *px = malloc(10);
    strcpy(px, pt);
    printf("%d\n", pt > px);

Based on my experiments with my compiler, I'm being led to think that any pointer can be compared with any other pointer, regardless of where they individually point. Moreover, I think pointer arithmetic between two pointers is fine, no matter where they individually point because the arithmetic is just using the memory addresses the pointers store.

Still, I am confused by what I am reading in K&R.

The reason I'm asking is because my prof. actually made it an exam question. He gave the following code:

struct A {
    char *p0;
    char *p1;
};

int main(int argc, char **argv) {
    char a = 0;
    char *b = "W";
    char c[] = [ 'L', 'O', 'L', 0 ];

   struct A p[3];
    p[0].p0 = &a;
    p[1].p0 = b;
    p[2].p0 = c;

   for(int i = 0; i < 3; i++) {
        p[i].p1 = malloc(10);
        strcpy(p[i].p1, p[i].p0);
    }
}

What do these evaluate to:

  1. p[0].p0 < p[0].p1
  2. p[1].p0 < p[1].p1
  3. p[2].p0 < p[2].p1

The answer is 0, 1, and 0.

(My professor does include the disclaimer on the exam that the questions are for a Ubuntu Linux 16.04, 64-bit version programming environment)

(editor's note: if SO allowed more tags, that last part would warrant x86-64, linux, and maybe assembly. If the point of the question / class was specifically low-level OS implementation details, rather than portable C.)


回答1:


According to the C11 standard, the relational operators <, <=, >, and >= may only be used on pointers to elements of the same array or struct object. This is spelled out in section 6.5.8p5:

When two pointers are compared, the result depends on the relative locations in the address space of the objects pointed to. If two pointers to object types both point to the same object, or both point one past the last element of the same array object, they compare equal. If the objects pointed to are members of the same aggregate object,pointers to structure members declared later compare greater than pointers to members declared earlier in the structure, and pointers to array elements with larger subscript values compare greater than pointers to elements of the same array with lower subscript values. All pointers to members of the same union object compare equal. If the expression P points to an element of an array object and the expression Q points to the last element of the same array object, the pointer expression Q+1 compares greater than P. In all other cases, the behavior is undefined.

Note that any comparisons that do not satisfy this requirement invoke undefined behavior, meaning (among other things) that you can't depend on the results to be repeatable.

In your particular case, for both the comparison between the addresses of two local variables and between the address of a local and a dynamic address, the operation appeared to "work", however the result could change by making a seemingly unrelated change to your code or even compiling the same code with different optimization settings. With undefined behavior, just because the could could crash or generate an error doesn't mean it will.

As an example, an x86 processor running in 8086 real mode has a segmented memory model using a 16-bit segment and a 16-bit offset to build a 20-bit address. So in this case an address doesn't convert exactly to an integer.

The equality operators == and != however do not have this restriction. They can be used between any two pointers to compatible types or NULL pointers. So using == or != in both of your examples would produce valid C code.

However, even with == and != you could get some unexpected yet still well-defined results. See Can an equality comparison of unrelated pointers evaluate to true? for more details on this.

Regarding the exam question given by your professor, it makes a number of flawed assumptions:

  • A flat memory model exists where there is a 1-to-1 correspondence between an address and an integer value
  • That the converted pointer values fit inside an integer type
  • That the implementation simply treats pointers as integers when performing comparisons without exploiting the freedom given by undefined behavior
  • That a stack is used and that local variables are stored there.
  • That a heap is used to pull allocated memory from
  • That the stack (and therefore local variables) appears at a higher address than the heap (and therefore allocated objects).
  • That string constants appear at a lower address that the heap.

If you were to run this code on an architecture and/or with a compiler that does not satisfy these assumptions then you could get very different results.

Also, both examples also exhibit undefined behavior when they call strcpy, since the right operand (in some cases) points to a single character and not a null terminated string, resulting in the function reading past the bounds of the given variable.




回答2:


The primary issue with comparing pointers to two distinct arrays of the same type is that the arrays themselves need not be placed in a particular relative positioning--one could end up before and after the other.

First of all, I thought I would get undefined or some type or error, because pt an px aren't pointing to the same array (at least in my understanding).

No, the result is dependent on implementation and other unpredictable factors.

Also is pt>px because both pointers are pointing to variables stored on the stack, and the stack grows down, so the memory address of t is greater than that of x? Which is why pt>px is true?

There isn't necessarily a stack. When it exists, it need not to grow down. It could grow up. It could be non-contiguous in some bizarre way.

Moreover, I think pointer arithmetic between two pointers is fine, no matter where they individually point because the arithmetic is just using the memory addresses the pointers store.

Let's look at the C specification, §6.5.8 on page 85 which discusses relational operators (i.e. the comparison operators you're using). Note that this does not apply to direct != or == comparison.

When two pointers are compared, the result depends on the relative locations in the address space of the objects pointed to. ... If the objects pointed to are members of the same aggregate object, ... pointers to array elements with larger subscript values compare greater than pointers to elements of the same array with lower subscript values.

In all other cases, the behavior is undefined.

The last sentence is important. While I cut down some unrelated cases to save space, there's one case that's important to us: two arrays, not part of the same struct/aggregate object1, and we're comparing pointers to those two arrays. This is undefined behavior.

While your compiler just inserted some sort of CMP (compare) machine instruction which numerically compares the pointers, and you got lucky here, UB is a pretty dangerous beast. Literally anything can happen--your compiler could optimize out the whole function including visible side effects. It could spawn nasal demons.

1Pointers into two different arrays that are part of the same struct can be compared, since this falls under the clause where the two arrays are part of the same aggregate object (the struct).




回答3:


Then asked what

p[0].p0 < p[0].p1
p[1].p0 < p[1].p1
p[2].p0 < p[2].p1

Evaluate to. The answer is 0, 1, and 0.

These questions reduce to:

  1. Is the heap above or below the stack.
  2. Is the heap above or below the string literal section of the program.
  3. same as [1].

And the answer to all three is "implementation defined". Your prof's questions are bogus; they have based it in traditional unix layout:

<empty>
text
rodata
rwdata
bss
< empty, used for heap >
...
stack
kernel

but several modern unices (and alternative systems) do not conform to those traditions. Unless they prefaced the question with " as of 1992 "; make sure to give a -1 on the eval.




回答4:


On almost any remotely-modern platform, pointers and integers have an isomorphic ordering relation, and pointers to disjoint objects are not interleaved. Most compilers expose this ordering to programmers when optimizations are disabled, but the Standard makes no distinction between platforms that have such an ordering and those that don't and does not require that any implementations expose such an ordering to the programmer even on platforms that would define it. Consequently, some compiler writers perform various kinds of optimizations and "optimizations" based upon an assumption that code will never compare use relational operators on pointers to different objects.

According to the published Rationale, the authors of the Standard intended that implementations extend the language by specifying how they will behave in situations the Standard characterizes as "Undefined Behavior" (i.e. where the Standard imposes no requirements) when doing so would be useful and practical, but some compiler writers would rather assume programs will never try to benefit from anything beyond what the Standard mandates, than allow programs to usefully exploit behaviors the platforms could support at no extra cost.

I'm not aware of any commercially-designed compilers that do anything weird with pointer comparisons, but as compilers move to the non-commercial LLVM for their back end, they're increasingly likely to process nonsensically code whose behavior had been specified by earlier compilers for their platforms. Such behavior isn't limited to relational operators, but can even affect equality/inequality. For example, even though the Standard specifies that a comparison between a pointer to one object and a "just past" pointer to an immediately-preceding object will compare equal, gcc and LLVM-based compilers are prone to generate nonsensical code if programs perform such comparisons.

As an example of a situation where even equality comparison behaves nonsensically in gcc and clang, consider:

extern int x[],y[];
int test(int i)
{
    int *p = y+i;
    y[0] = 4;
    if (p == x+10)
        *p = 1;
    return y[0];
}

Both clang and gcc will generate code that will always return 4 even if x is ten elements, y immediately follows it, and i is zero resulting in the comparison being true and p[0] being written with the value 1. I think what happens is that one pass of optimization rewrites the function as though *p = 1; were replaced with x[10] = 1;. The latter code would be equivalent if the compiler interpreted *(x+10) as equivalent to *(y+i), but unfortunately a downstream optimization stage recognizes that an access to x[10] would only defined if x had at least 11 elements, which would make it impossible for that access to affect y.

If compilers can get that "creative" with pointer equality scenario which is described by the Standard, I would not trust them to refrain from getting even more creative in cases where the Standard doesn't impose requirements.




回答5:


Pointer comparison in C -- and the languages derived from it (C++, Objective-C, etc.) -- is implemented at the machine level, which means that in effect the entities being compared are simply numbers. As a result, you will always get some kind of 'valid' result, while the suitability of that result to your present purpose may not be so valid.

This is the issue addressed in both quotations you provide, although they are concerned with different aspects of the problem.

When your only concern is whether two pointers are identical, or not, then comparison is inevitably valid. Consequently, the equality operators == and != will function properly and necessarily produce the desired and valid result.

Your quotations, however, address the relational operators (<, <=, > and >=). Since there can be no defined relationship between two separately allocated pointers -- regardless whether they refer to addresses allocated dynamically through malloc and co., or automatically as you have done in your example, relational comparison of such pointers can be made meaningful -- it cannot make valid sense. Which is why the standard describes the result as 'undefined'.

So; you are absolutely correct:

... any pointer can be compared with any other pointer, regardless of where they individually point ...

However, it is exceedingly dangerous to conclude that:

... arithmetic between two pointers is fine, no matter where they individually point ...

This is precisely what 'standards' strive to protect programmers from; crafting ill-formed, and invalid comparisons.

Since the final address of a variable depends on all of the compiler, the libraries in use, the platform the application is running on, and perhaps most influentially: the present execution environment, it is highly likely that the relationship between pointers that reference separately allocated blocks may be inverted the next time the code executes. Which means that one is not able to rely on the result being in one direction or the other. This can only work when both pointers refer to the same block of memory -- such as would apply to a single, C-style array.

Arithmetic is still more problematic than relational comparison, because it implies that you are expecting not merely a consistent order between the two, but a reproducible difference. While this may remain constant within a single -- relatively simple -- function, code that relies on such a relationship is brittle by definition, and therefore probably not the kind you intend to craft.




回答6:


Pointers are just integers, like everything else in a computer. You absolutely can compare them with < and > and produce results without causing a program to crash. That said, the standard does not guarantee that those results have any meaning outside of array comparisons.

In your example of stack allocated variables, the compiler is free to allocate those variables to registers or stack memory addresses, and in any order it so choose. Comparisons such as < and > therefore won't be consistent across compilers or architectures. However, == and != aren't so restricted, comparing pointer equality is a valid and useful operation.



来源:https://stackoverflow.com/questions/59516346/how-does-pointer-comparison-work-in-c-is-it-ok-to-compare-pointers-that-dont-p

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!