Can a “container_of” macro ever be strictly-conforming?

好久不见. 提交于 2019-11-29 17:25:50

问题


A commonly-used macro in the linux kernel (and other places) is container_of, which is (basically) defined as follows:

#define container_of(ptr, type, member) (((type) *)((char *)(ptr) - offsetof((type), (member))))

Which basically allows recovery of a "parent" structure given a pointer to one of its members:

struct foo {
    char ch;
    int bar;
};
...
struct foo f = ...
int *ptr = &f.bar; // 'ptr' points to the 'bar' member of 'struct foo' inside 'f'
struct foo *g = container_of(ptr, struct foo, bar);
// now, 'g' should point to 'f', i.e. 'g == &f'

However, it's not entirely clear whether the subtraction contained within container_of is considered undefined behavior.

On one hand, because bar inside struct foo is only a single integer, then only *ptr should be valid (as well as ptr + 1). Thus, the container_of effectively produces an expression like ptr - sizeof(int), which is undefined behavior (even without dereferencing).

On the other hand, §6.3.2.3 p.7 of the C standard states that converting a pointer to a different type and back again shall produce the same pointer. Therefore, "moving" a pointer to the middle of a struct foo object, then back to the beginning should produce the original pointer.

The main concern is the fact that implementations are allowed to check for out-of-bounds indexing at runtime. My interpretation of this and the aforementioned pointer equivalence requirement is that the bounds must be preserved across pointer casts (this includes pointer decay - otherwise, how could you use a pointer to iterate across an array?). Ergo, while ptr may only be an int pointer, and neither ptr - 1 nor *(ptr + 1) are valid, ptr should still have some notion of being in the middle of a structure, so that (char *)ptr - offsetof(struct foo, bar) is valid (even if the pointer is equal to ptr - 1 in practice).

Finally, I came across the fact that if you have something like:

int arr[5][5] = ...
int *p = &arr[0][0] + 5;
int *q = &arr[1][0];

while it's undefined behavior to dereference p, the pointer by itself is valid, and required to compare equal to q (see this question). This means that p and q compare the same, but can be different in some implementation-defined manner (such that only q can be dereferenced). This could mean that given the following:

// assume same 'struct foo' and 'f' declarations
char *p = (char *)&f.bar;
char *q = (char *)&f + offsetof(struct foo, bar);

p and q compare the same, but could have different boundaries associated with them, as the casts to (char *) come from pointers to incompatible types.


To sum it all up, the C standard isn't entirely clear about this type of behavior, and attempting to apply other parts of the standard (or, at least my interpretations of them) leads to conflicts. So, is it possible to define container_of in a strictly-conforming manner? If so, is the above definition correct?


This was discussed here after comments on my answer to this question.


回答1:


I think its strictly conforming or there's a big defect in the standard. Referring to your last example, the section on pointer arithmetic doesn't give the compiler any leeway to treat p and q differently. It isn't conditional on how the pointer value was obtained, only what object it points to.

Any interpretation that p and q could be treated differently in pointer arithmetic would require an interpretation that p and q do not point to the same object. Since since there's no implementation dependent behaviour in how you obtained p and q then that would mean they don't point to the same object on any implementation. That would in turn require that p == q be false on all implementations, and so would make all actual implementations non-conforming.




回答2:


I just want to answer this bit.

int arr[5][5] = ...
int *p = &arr[0][0] + 5;
int *q = &arr[1][0];

This is not UB. It is certain that p is a pointer to an element of the array, provided only that it is within bounds. In each case it points to the 6th element of a 25 element array, and can safely be dereferenced. It can also be incremented or decremented to access other elements of the array.

See n3797 S8.3.4 for C++. The wording is different for C, but the meaning is the same. In effect arrays have a standard layout and are well-behaved with respect to pointers.


Let us suppose for a moment that this is not so. What are the implications? We know that the layout of an array int[5][5] is identical to int[25], there can be no padding, alignment or other extraneous information. We also know that once p and q have been formed and given a value, they must be identical in every respect.

The only possibility is that, if the standard says it is UB and the compiler writer implements the standard, then a sufficiently vigilant compiler might either (a) issue a diagnostic based on analysing the data values or (b) apply an optimisation which was dependent on not straying outside the bounds of sub-arrays.

Somewhat reluctantly I have to admit that (b) is at least a possibility. I am led to the rather strange observation that if you can conceal from the compiler your true intentions this code is guaranteed to produce defined behaviour, but if you do it out in the open it may not.



来源:https://stackoverflow.com/questions/25296019/can-a-container-of-macro-ever-be-strictly-conforming

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!