Why does Rust disallow mutable aliasing?

问题

Rust disallows this kind of code because it is unsafe:

fn main() {
    let mut i = 42;
    let ref_to_i_1 = unsafe { &mut *(&mut i as *mut i32) };
    let ref_to_i_2 = unsafe { &mut *(&mut i as *mut i32) };

    *ref_to_i_1 = 1;
    *ref_to_i_2 = 2;
}

How can I do do something bad (e.g. segmentation fault, undefined behavior, etc.) with multiple mutable references to the same thing?

The only possible issues I can see come from the lifetime of the data. Here, if i is alive, each mutable reference to it should be ok.

I can see how there might be problems when threads are introduced, but why is it prevented even if I do everything in one thread?

回答1:

How can I do do something bad (e.g. segmentation fault, undefined behavior, etc.) with multiple mutable references to the same thing?

I believe that although you trigger 'undefined behavior' by doing this, technically the noalias flag is not used by the Rust compiler yet, so practically speaking, right now, you probably can't actually trigger undefined behavior this way, what you're triggering is 'implementation specific behavior', which is 'behaves like C++ according to LLVM'.

There's a tracking issue.

I can see how there might be problems when threads are introduced, but why is it prevented even if I do everything in one thread?

Have a read of this series of blog articles about undefined behavior

In my opinion, race conditions (like iterators) aren't really a good example of what you're talking about; in a single threaded environment you can avoid that sort of problem if you're careful. This is no different to creating an arbitrary pointer to invalid memory and writing to it; just don't do it. You're no worse off than using C.

To understand the issue here, consider when compiling in release mode the compiler may or may not reorder statements when optimizations are performed; that means that although your code may run in the linear sequence:

a; b; c;

There is no guarantee the compiler will execute them in that sequence when it runs, if (according to what the compiler knows), there is no logical reason that the statements must be performed in a specific atomic sequence. Part 3 of the blog I've linked to above demonstrates how this can cause undefined behavior.

tl;dr: Basically, the compiler may perform various optimizations; these are guaranteed to continue to make your program behave in a deterministic fashion if and only if your program does not trigger undefined behavior.

As far as I'm aware the Rust compiler currently doesn't use many 'advanced optimizations' that may cause this kind of failure, but there is no guarantee that it won't in the future. It is not a 'breaking change' to introduce new compiler optimizations.

So... it's actually probably quite unlikely you'll be able to trigger actual undefined behavior just via mutable aliasing right now; but the restriction allows the possibility of future performance optimizations.

Pertinent quote:

The C FAQ defines “undefined behavior” like this:

Anything at all can happen; the Standard imposes no requirements. The program may fail to compile, or it may execute incorrectly (either crashing or silently generating incorrect results), or it may fortuitously do exactly what the programmer intended.

回答2:

A really common pitfall in C++ programs, and even in Java programs, is modifying a collection while iterating over it, like this:

for (it: collection) {
    if (predicate(*it)) {
        collection.remove(it);
    }
}

For C++ standard library collections, this causes undefined behaviour. Maybe the iteration will work until you get to the last entry, but the last entry will dereference a dangling pointer or read off the end of an array. Maybe the whole array underlying the collection will be relocated, and it'll fail immediately. Maybe it works most of the time but fails if a reallocation happens at the wrong time. In most Java standard collections, it's also undefined behaviour according to the language specification, but the collections tend to throw ConcurrentModificationException - a check which causes a runtime cost even when your code is correct. Neither language can detect the error during compilation.

This is a common example of a data race caused by concurrency, even in a single-threaded environment. Concurrency doesn't just mean parallelism: it can also mean nested computation. In Rust, this kind of mistake is detected during compilation because the iterator has an immutable borrow of the collection, so you can't mutate the collection while the iterator is alive.

An easier-to-understand but less common example is pointer aliasing when you pass multiple pointers (or references) to a function. A concrete example would be passing overlapping memory ranges to memcpy instead of memmove. Actually, Rust's memcpy equivalent is unsafe too, but that's because it takes pointers instead of references. The linked page shows how you can make a safe swap function using the guarantee that mutable references never alias.

A more contrived example of reference aliasing is like this:

int f(int *x, int *y) { return (*x)++ + (*y)++; }
int i = 3;
f(&i, &i); // result is undefined

You couldn't write a function call like that in Rust because you'd have to take two mutable borrows of the same variable.

来源：https://stackoverflow.com/questions/49174630/why-does-rust-disallow-mutable-aliasing

标签

rust

undefined-behavior

lifetime