How does Rust achieve compile-time-only pointer safety?

一笑奈何 提交于 2019-12-03 05:23:06

Disclaimer: I'm in a bit of a hurry, so this is a bit meandering. Feel free to clean it up.

The One Sneaky Trick That Language Designers Hate™ is basically this: Rust can only reason about the 'static lifetime (used for global variables and other whole-program lifetime things) and the lifetime of stack (i.e. local) variables: it cannot express or reason about the lifetime of heap allocations.

This means a few things. First of all, all of the library types that deal with heap allocations (i.e. Box<T>, Rc<T>, Arc<T>) all own the thing they point to. As a result, they don't actually need lifetimes in order to exist.

Where you do need lifetimes is when you're accessing the contents of a smart pointer. For example:

let mut x: Box<i32> = box 0;
*x = 42;

What is happening behind the scenes on that second line is this:

{
    let box_ref: &mut Box<i32> = &mut x;
    let heap_ref: &mut i32 = box_ref.deref_mut();
    *heap_ref = 42;
}

In other words, because Box isn't magic, we have to tell the compiler how to turn it into a regular, run of the mill borrowed pointer. This is what the Deref and DerefMut traits are for. This raises the question: what, exactly, is the lifetime of heap_ref?

The answer to this is in the definition of DerefMut (from memory because I'm in a hurry):

trait DerefMut {
    type Target;
    fn deref_mut<'a>(&'a mut self) -> &'a mut Target;
}

Like I said before, Rust absolutely cannot talk about "heap lifetimes". Instead, it has to tie the lifetime of the heap-allocated i32 to the only other lifetime it has on hand: the lifetime of the Box.

What this means is that "complicated" things don't have an expressible lifetime, and thus have to own the thing they manage. When you convert a complicated smart pointer/handle into a simple borrowed pointer, that is the moment that you have to introduce a lifetime, and you usually just use the lifetime of the handle itself.

Actually, I should clarify: by "lifetime of the handle", I really mean "the lifetime of the variable in which the handle is currently being stored": lifetimes are really for storage, not for values. This is typically why newcomers to Rust get tripped up when they can't work out why they can't do something like:

fn thingy<'a>() -> (Box<i32>, &'a i32) {
    let x = box 1701;
    (x, &x)
}

"But... I know that the box will continue to live on, why does the compiler say it doesn't?!" Because Rust can't reason about heap lifetimes and must resort to tying the lifetime of &x to the variable x, not the heap allocation it happens to point to.

oli_obk

Is it the case that pointer checking isn't done entirely at compile-time, and Rust's smart pointers still introduce some runtime overhead compared to, say, raw pointers in C?

There are special runtime-checks for things that can't be checked at compile time. These are usually found in the cell crate. But in general, Rust checks everything at compile time and should produce the same code as you would in C (if your C-code isn't doing undefined stuff).

Or is it possible that the Rust compiler can't make fully correct decisions, and it sometimes needs to Just Trust The Programmer™, probably using one of the lifetime annotations (the ones with the <'lifetime_ident> syntax)? In this case, does this mean that the pointer/memory safety guarantee is not 100%, and still relies on the programmer writing correct code?

If the compiler cannot make the correct decision you get a compile time error telling you that the compiler cannot verify what you are doing. This might also restrict you from stuff you know is correct, but the compiler doesn't. You can always go to unsafe code in that case. But as you correctly assumed, then the compiler relies partly on the programmer.

The compiler checks the function's implementation, to see if it does exactly what the lifetimes say it does. Then, at the call-site of the function, it checks if the programmer uses the function correctly. This is similar to type-checking. A C++ compiler checks if you are returning an object of the correct type. Then it checks at the call-site if the returned object is stored in a variable of the correct type. At no time can the programmer of a function break the promise (except if unsafe is used, but you can always let the compiler enforce that no unsafe is used in your project)

Rust is continuously improved. More things may get legal in Rust once the compiler becomes smarter.

Another possibility is that Rust pointers are non-"universal" or restricted in some sense, so that the compiler can infer their properties entirely during compile-time, but they are not as useful as e. g. raw pointers in C or smart pointers in C++.

There's a few things that can go wrong in C:

  1. dangling pointers
  2. double free
  3. null pointers
  4. wild pointers

These don't happen in safe Rust.

  1. You can never have a pointer that points to an object no longer on the stack or heap. That's proven at compile time through lifetimes.
  2. You do not have manual memory management in Rust. Use a Box to allocate your objects (similar but not equal to a unique_ptr in C++)
  3. Again, no manual memory management. Boxes automatically free memory.
  4. In safe Rust you can create a pointer to any location, but you cannot dereference it. Any reference you create always is bound to an object.

There's a few things that can go wrong in C++:

  1. everything that can go wrong in C
  2. SmartPointers only help you not forget calling free. You can still create dangling references: auto x = make_unique<int>(42); auto& y = *x; x.reset(); y = 99;

Rust fixes those:

  1. see above
  2. as long as y exists, you may not modify x. This is checked at compile time and cannot be circumvented by more levels of indirection or structs.

I have read somewhere that in a language that features pointers, it is not possible for the compiler to decide fully at compile time whether all pointers are used correctly and/or are valid (refer to an alive object) for various reasons, since that would essentially constitute solving the halting problem.

Rust doesn't prove you all pointers are used correctly. You may still write bogus programs. Rust proves that you are not using invalid pointers. Rust proves that you never have null-pointers. Rust proves that you never have two pointers to the same object, execept if all these pointers are non-mutable (const). Rust does not allow you to write any program (since that would include programs that violate memory safety). Right now Rust still prevents you from writing some useful programs, but there are plans to allow more (legal) programs to be written in safe Rust.

That is not surprising, intuitively, because in this case, we would be able to infer the runtime behavior of a program during compile-time, similarly to what's stated in this related question.

Revisiting the example in your referenced question about the halting problem:

void foo() {
    if (bar() == 0) this->a = 1;
}

The above C++ code would look one of two ways in Rust:

fn foo(&mut self) {
    if self.bar() == 0 {
        self.a = 1;
    }
}

fn foo(&mut self) {
    if bar() == 0 {
        self.a = 1;
    }
}

For an arbitrary bar you cannot prove this, because it might access global state. Rust soon gets const functions, which can be used to compute stuff at compile-time (similar to constexpr). If bar is const, it becomes trivial to prove if self.a is set to 1 at compile-time. Other than that, without pure functions or other restrictions of the function content, you can never prove whether self.a is set to 1 or not.

Rust currently doesn't care whether your code is called or not. It cares whether the memory of self.a still exists during the assignment. self.bar() can never destroy self (except in unsafe code). Therefor self.a will always be available inside the if branch.

Most of the safety of Rust references is guaranteed by strict rules:

  • If you posses a const reference (&), you can clone this reference and pass it around, but not create a mutable &mut reference out of it.
  • If a mutable (&mut) reference to an object exists, no other reference to this object can exist.
  • A reference is not allowed to outlive the object it refers to, and all functions manipulating references must declare how the references from their input and output are linked, using lifetime annotations (like 'a).

So in terms of expressiveness, we are effectively more limited than when using plain raw pointers (for example, building a graph structure is not possible using only safe references), but these rules can effectively be completely checked at compile-time.

Yet, it is still possible to use raw pointers, but you have to enclose the code dealing with them in a unsafe { /* ... */ } block, telling to the compiler "Trust me, I know what I am doing here". That is what some special smart pointers do internally, such as RefCell, which allows you to have these rules checked at runtime rather than compile-time, to gain expressiveness.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!