Why don't purely functional languages use reference counting?

问题

In purely functional languages, data is immutable. With reference counting, creating a reference cycle requires changing already created data. It seems like purely functional languages could use reference counting without worrying about the possibility of cycles. Am is right? If so, why don't they?

I understand that reference counting is slower than GC in many cases, but at least it reduces pause times. It would be nice to have the option to use reference counting in cases where pause times are bad.

回答1:

Your question is based on a faulty assumption. It's perfectly possible to have circular references and immutable data. Consider the following C# example which uses immutable data to create a circular reference.

class Node { 
  public readonly Node other;
  public Node() { 
    other = new Node(this);
  }
  public Node(Node node) {
    other = node;
  }
}

This type of trick can be done in many functional languages and hence any collection mechanism must deal with the possibility of circular references. I'm not saying a ref counting mechanism is impossible with a circular reference, just that it must be dealt with.

Edit by ephemient

In response to the comment... this is trivial in Haskell

data Node a = Node { other :: Node a }
recursiveNode = Node { other = recursiveNode }

and barely any more effort in SML.

datatype 'a node = NODE of unit -> 'a node
val recursiveNode : unit node =
    let fun mkRecursiveNode () = NODE mkRecursiveNode
    in mkRecursiveNode () end

No mutation required.

回答2:

Relative to other managed languages like Java and C#, purely functional languages allocate like crazy. They also allocate objects of different sizes. The fastest known allocation strategy is to allocate from contiguous free space (sometimes called a "nursery") and to reserve a hardware register to point to the next available free space. Allocation from the heap becomes as fast as allocation from a stack.

Reference counting is fundamentally incompatible with this allocation strategy. Ref counting puts objects on free lists and takes them off again. Ref counting also has substantial overheads required for updating ref counts as new objects are created (which, as noted above, pure functional languages do like crazy).

Reference counting tends to do really well in situations like these:

Almost all heap memory is used to hold live objects.
Allocation and pointer assignment are infrequent relative to other operations.
References can be managed on another processor or computer.

To understand how the best high-performance ref-counting systems work today, look up the work of David Bacon and Erez Petrank.

回答3:

There are a few things, I think.

There are cycles: "let rec" in many languages does allow "circular" structures to be created. Apart from this, immutability does usually imply no cycles, but this breaks the rule.
Ref-counts are bad at lists: I don't know that reference-counted collection works well with e.g. long singly-linked-list structures you often find in FP (e.g. slow, need to ensure tail-recursive, ...)
Other strategies have benefits: As you allude to, other GC strategies are still usually better for memory locality

(Once upon a time I think I maybe really 'knew' this, but now I am trying to remember/speculate, so don't take this as any authority.)

回答4:

Consider this allegory told about David Moon, an inventor of the Lisp Machine:

One day a student came to Moon and said: "I understand how to make a better garbage collector. We must keep a reference count of the pointers to each cons."

Moon patiently told the student the following story:

"One day a student came to Moon and said: 'I understand how to make a better garbage collector...

回答5:

Am is right?

Not quite. You can create cyclic data structures using purely functional programming simply by defining mutually-recursive values at the same time. For example, in OCaml:

let rec xs = 0::ys and ys = 1::xs

However, it is possible to define languages that make it impossible to create cyclic structures by design. The result is known as a unidirectional heap and its primary advantage is that garbage collection can be as simple as reference counting.

If so, why don't they?

Some languages do prohibit cycles and use reference counting. Erlang and Mathematica are examples.

For example, in Mathematica when you reference a value you make a deep copy of it so mutating the original does not mutate the copy:

In[1] := xs = {1, 2, 3}
Out[1] = {1, 2, 3}

In[2] := ys = xs
Out[2] = {1, 2, 3}

In[3] := xs[[1]] = 5
Out[3] = 5

In[4] := xs
Out[4] = {5, 2, 3}

In[5] := ys
Out[5] = {1, 2, 3}

回答6:

Reference counting is MUCH slower than GC because it's not good for CPU. And GC most of the time can wait for idle time and also GC can be concurrent (on another thread). So that's the problem - GC is least evil and lots of tries shown that.

来源：https://stackoverflow.com/questions/791437/why-dont-purely-functional-languages-use-reference-counting

标签

memory-management

functional-programming

garbage-collection

reference-counting

purely-functional