Functions that look pure to callers but internally use mutation

问题

I just got my copy of Expert F# 2.0 and came across this statement, which somewhat surprised me:

For example, when necessary, you can use side effects on private data structures allocated at the start of an algorithm and then discard these data structures before returning a result; the overall result is then effectively a side-effect-free function. One example of separation from the F# library is the library's implementation of List.map, which uses mutation internally; the writes occur on an internal, separated data structure that no other code can access.

Now, obviously the advantage to this approach is performance. I'm just curious if there are any disadvantages--do any of the pitfalls that can come with side-effects apply here? Is parallelisibility affected?

In other words, if performance were set aside, would it be preferable to implement List.map in a pure way?

(Obviously this deals with F# in particular, but I'm also curious about general philosophy)

回答1:

I think nearly every disadvantage of side-effects is tied to "interaction with other portions of the program". Side-effects themselves aren't bad (as @Gabe says, even a pure functional program is constantly mutating RAM), it's the common-side-consequences of effects (non-local interactions) which cause problems (with debugging/performance/understandability/etc.). So effects on purely local state (e.g. on a local variable that does not escape) are fine.

(The only detriment I can think of is that, when a human sees such a local mutable, they have to reason about whether it can escape. In F#, local mutables can never escape (closures cannot capture mutables), so the only potential "mental tax" comes from reasoning about mutable reference types.)

Summary: it's fine to use effects, so long as it's simple to convince one's self that the effects only happen on non-escaping locals. (It's also ok to use effects in other cases, but I'm ignoring those other cases, since on this question-thread we are the enlightened functional programmers trying to eschew effects whenever it's reasonable. :) )

(If you want to go very deep, local effects, like those in the implementation of F#'s List.map, are not only a non-hindrance to parallelizability, but actually a benefit, from the point of view that the more-efficient implementation allocates less, and thus is less of a strain on the shared resource of the GC.)

回答2:

You might be interested in Simon Peyton Jones's "Lazy Functional State Threads". I've only ever made it through the first few pages, which are very clear (I'm sure the rest is very clear as well).

The important point is that when you use Control.Monad.ST to do this kind of thing in Haskell, the type system itself enforces the encapsulation. In Scala (and probably F#) the approach is more "just trust us that we're not doing anything sneaky over here with this ListBuffer in your map".

回答3:

If a function uses local, private (to the function) mutable data structures, parallelization is not affected. So if the map function internally creates an array of the size of the list and iterates over its elements filling in the array, you could still run map 100 times concurrently on the same list and not worry because each instance of map will have its own private array. Since your code cannot see the contents of the array until it has been populated, it is effectively pure (remember, at some level your computer has to actually mutate the state of RAM).

On the other hand if a function uses global mutable data structures, parallelization could be affected. For example, let's say you have a Memoize function. Obviously the whole point of it is to maintain some global state (although "global" in the sense that it is not local to the function call, it is still "private" in the sense that it is not accessible outside the function) so that it doesn't have to run a function multiple times with the same arguments, yet it is still pure because the same inputs will always produce the same outputs. If your cache data structure is thread-safe (like ConcurrentDictionary) then you can still run your function in parallel with itself. If not, then you could argue that the function is not pure because it has side effects that are observable when run concurrently.

I should add that it is a common technique in F# to start off with a purely functional routine, then optimize it by taking advantage of mutable state (e.g. caching, explicit looping) when profiling shows that it's too slow.

回答4:

Same approach can be found in use in Clojure. The immutable data structures in Clojure - list, map and vector - have their "transient" counterparts which are mutable. The Clojure reference about transient urges to use them only in the code which cannot be seen by "any other code".

There are guards against leaking transients in client code:

The usual function which work on the immutable data structures don't work on transients. Calling them will throw an exception.
The transients are bound to the thread they are created in. Modifying them from any other thread will throw an exception.

The clojure.core code itself uses a lot of transients behind the scenes.

The main benefit of using transients is the massive speed-up they provide.

So the tightly controlled use of mutable state seems to be OK in the functional languages.

回答5:

It won't affect whether the function can be run in parallel with other functions. It will affect whether the function's internals can be made parallel though - but that's unlikely to be an issue for most small functions (such as map), targeting a PC.

I've noticed is that some good F# programmers (on the web, and in books) seem be very relaxed about using imperative techniques for loops. They seem to prefer a simple loop, with mutable loop variables, to a complex recursive function.

回答6:

One issue can be, that a good functional compiler is constructed to optimize "functional" code well, but if you're using some mutable stuff, the compiler may not optimize as good as in the other case. In the worst case, this leads to more inefficient algorithms then the immutable variant.

The other issue I can think of is laziness - a mutable data structure is usually not lazy, thus a mutable unction may force unneccesary evaluation of arguments.

回答7:

I would answer this with a question: "are you writing the function, or using the function?"

There are 2 aspects to functions, users and developers.

As a user, one doesn't care about the internal structure of a function at all. It can be coded in byte code and use hard side effects internally from now until judgement day so long as it matches the contract of data in->data out that one expects. A function is a black box or an oracle, it's internal structure is irrelevant(assuming it doesn't do anything stupid and external).

As a developer of a function, the internal structure matters a lot. Immutability, const correctness and avoiding side effects all help develop and maintain a function, and expand the function into the parallel domain.

Many people develop a function that they then use, so both of these aspects apply.

What the advantages are of immutability vs. mutable structures is a different question.

来源：https://stackoverflow.com/questions/3697760/functions-that-look-pure-to-callers-but-internally-use-mutation

标签

haskell

functional-programming

monads

side-effects