Effective type rules with relation to strict aliasing

问题

So, I've been banging my head against the Strict Aliasing Rule and the effective type rules for the past couple of days. While the spirit of it is pretty clear, I'd like to nail down a good technical understanding of the rules. Please note I've gone through many related questions on SO, but I don't feel that the questions to be presented here have been answered in a way that really sits with me in any other place.

This question is divided into two parts.

In the first part, I divide the effective type rules into sentences, and explain my own understanding of each one. For each of these, please validate my understanding if it is correct, or correct me if it's flawed and explain why it is. For the last "sentence", I also present two questions that I would appreciate answers to.

The second part of the question concerns my understanding of the SAR.

Part 1: Effective type rules

Sentence 1

The effective type of an object for an access to its stored value is the declared type of the object, if any.

This is pretty clear - a declared object such as int x has a permanent effective type, which is the type it is declared with (int in this case).

Sentence 2

If a value is stored into an object having no declared type through an lvalue having a type that is not a character type, then the type of the lvalue becomes the effective type of the object for that access and for subsequent accesses that do not modify the stored value.

An "object having no declared type" is generally a dynamically allocated object.

When we store data inside an allocated object (whether or not it already has an effective type) the effective type of the object becomes the type of the lvalue used to access the data for storing (unless the lvalue is of character type). So for example:

int* x = malloc(sizeof(int)); // *x has no effective type yet
*x = 10; // *x has effective type int, because the type of lvalue *x is int

It is also possible to change the effective type of an object that already has an effective type. For example:

float* f = (float*) x;
*f = 20.5; // *x now has effective type float, because the type of lvalue *f is float.

Sentence 3

If a value is copied into an object having no declared type using memcpy or memmove, or is copied as an array of character type, then the effective type of the modified object for that access and for subsequent accesses that do not modify the value is the effective type of the object from which the value is copied, if it has one.

This means that when we set a value into an allocated object, if the value is set through an lvalue of type compatible with char* (or through memcpy and memmove), the effective type of the object becomes the effective type of the data that is copied into it. For example:

int* int_array = malloc(sizeof(int) * 5); // *int_array has no effective type yet
int other_int_array[] = {10, 20, 30, 40, 50};
char* other_as_char_array = (char*) other_int_array;
for (int i = 0; i < sizeof(int) * 5; i++) {
    *((char*) int_array + i) = other_as_char_array[i];
}
// *int_array now has effective type int

Sentence 4

For all other accesses to an object having no declared type, the effective type of the object is simply the type of the lvalue used for the access.

I have two question regarding this part:

A. By "For all other accesses", does the text simply mean "for all read accesses"?

It seems to me that all the previous rules that refer to objects of undeclared types, only deal with storing a value. So is this simply the rule for any read operation against an object of undeclared type (which may or may not already have an effective type)?

B. A particular object in memory only ever has one effective type. So - what does the text mean by "For all other accesses"... It's not a matter of the access, it's a matter of the objective effective type of the object. Isn't it? Please clarify the language of the text.

Part 2: A question about Strict Aliasing

The strict aliasing rule description starts like so (emphasis mine):

An object shall have its stored value accessed only by an lvalue expression that has one of the following types [...]

When the text says "stored value accessed" - does it mean both read and write accessing, or only read?

As another way to ask this question: does the following code constitute a Strict Aliasing violation or is it legal?

int* x = malloc(sizeof(int)); // *x - no effective type yet
*x = 8; // *x - effective type int
printf("%d \n", *x); // access the int object through lvalue *x

float* f = (float*) x; // casting itself is legal
*f = 12.5; // effective type of *x changes to float - *** is this a SAR violation? ***
printf("%g \n", *f); // access the float object through lvalue *f

回答1:

"access" means read or write. "For all other accesses" means any accesses not already covered in that paragraph. To recap, the accesses to objects of no declared type that have been covered are:

a value is stored into an object having no declared type through an lvalue having a type that is not a character type,
subsequent accesses that do not modify the stored value
a value is copied into an object having no declared type using memcpy or memmove
or is copied as an array of character type

So the remaining case of "all read and writes" are:

a value is stored into an object having no declared type through an lvalue having a type that is a character type,
any other writes we didn't think of

In part 2 the code is correct according to the text of C11 as per:

If a value is stored into an object having no declared type through an lvalue having a type that is not a character type, then the type of the lvalue becomes the effective type of the object for that access

*x = 8; stores a value into an object having no declared type through an lvalue having a type that is not a character type. So the effective type of the object for this access is int, and then in 6.5/7 we have the object of effective type int being accesses by an lvalue of type int. The same reasoning applies to *f = 20.5 with float instead of int.

Footnote: there are many reasons to believe the text of 6.5/6 and /7 to be defective, as you will have seen from searching other questions on the topic. People (and compiler writers) form their own interpretation of the rule .

回答2:

So far as I can tell, there has never been any consensus understanding among committee members as to what the "Effective Type" rules are supposed to mean in all corner cases; any plausible interpretation will either forbid what should be useful optimizations, fail to accommodate what should be usable constructs, or both. So far as I can tell, no compiler that is nearly as "strict" as clang and gcc correctly handles all of the corner cases posed by the rules in a manner consistent with any reasonable interpretation of the Standard.

struct s1 { char x[1]; };
struct s2 { char x[1]; };

void convert_p_to_s1(void *p)
{
    int q = ((struct s2*)p)->x[0]+1;
    ((struct s1*)p)->x[0] = q-1;
}

int test(struct s1 *p1, struct s2 *p2)
{
    p1->x[0] = 1;
    p2->x[0] = 2;
    convert_p_to_s1(p1);
    return p1->x[0];
}

Neither clang nor gcc will allow for the possibility that test might write member x[0] of a struct s1 to a location, then write that same location using member x[0] of a struct s2, then read using x[0] of a struct s2, write using x[0] of a struct s1, and then read using x[0] of struct s1, with all reads and writes being performed by dereferencing pointers type char*, and with every read of an lvalue derived from a structure pointer preceded by a write of that storage by an lvalue derived in the same way from a pointer of the same type.

Prior to C99, it was pretty much universally recognized that quality implementations should refrain from applying the type-access rules in ways that would be detrimental to their customers, without regard for whether the Standard would require such restraint. Because some implementations were used for purposes which required the ability to access objects in weird ways but wouldn't require fancy optimizations, while others were used for purposes that didn't need to access storage in tricky fashion but required more optimizations, the question of exactly when implementations should recognize that an access to one object might affect another was left as a Quality of Implementation issue.

Some authors of C99, however, likely objected to the fact that the rules didn't actually require that implementations support constructs that all implementations should support, and in fact nearly all implementations were already supporting. To address what they saw as a defect, they added some additional rules which would mandate support for some constructs they thought all implementations should support, and which would deliberately not mandate support for some constructs for which universal support should not have been required. They do not appear, however, to have made any significant effort to consider corner cases and whether the rules would handle them sensibly.

The only way the Standard can ever say anything useful about pointer aliasing will be if the authors are willing to acknowledge that some tasks require stronger guarantees than others, and implementations intended for different kinds of tasks should be expected to uphold different guarantees appropriate to those tasks. Otherwise, C should be treated as two families of dialects--one of which requires that any storage which is ever accessed using a particular type never be accessed using any other during its lifetime, and one of which recognizes that operations on the target of a pointer which is freshly visibly derived from a pointer of another type may affect the object identified by the original pointer.

来源：https://stackoverflow.com/questions/61297449/effective-type-rules-with-relation-to-strict-aliasing

标签

language-lawyer

strict-aliasing