help me reason about F# threads

问题

In goofing around with some F# (via MonoDevelop), I have written a routine which lists files in a directory with one thread:

let rec loop (path:string) = 
  Array.append
    (
        path |> Directory.GetFiles
    )
    (
        path 
        |> Directory.GetDirectories
        |> Array.map loop
        |> Array.concat
    )

And then an asynchronous version of it:

let rec loopPar (path:string) = 
  Array.append
    ( 
        path |> Directory.GetFiles
    )
    ( 
        let paths = path |> Directory.GetDirectories
        if paths <> [||] then
            [| for p in paths -> async { return (loopPar p) } |]
            |> Async.Parallel
            |> Async.RunSynchronously 
            |> Array.concat
        else 
            [||]
    )

On small directories, the asynchronous version works fine. On bigger directories (e.g. many thousands of directories and files), the asynchronous version seems to hang. What am I missing?

I know that creating thousands of threads is never going to be the most efficient solution -- I only have 8 CPUs -- but I am baffled that for larger directories the asynchronous function just doesn't respond (even after a half hour). It doesn't visibly fail, though, which baffles me. Is there a thread pool which is exhausted?

How do these threads actually work?

Edit:

According to this document:

Mono >=2.8.x has a new threadpool that is much, much harder to deadlock. If you get a threadpool deadlock chances are that your program is trying to be deadlocked.

回答1:

Yes, most likely you are overwhelming the Mono thread pool which is grinding your system's performance to a halt.

If you remember one thing from this, it is that threads are expensive. Each thread needs its own stack (megabytes in size) and slice of CPU time (requiring context switching). Because of this, it is rarely a good idea to spin up your own thread for short lived tasks. That is why .NET has a ThreadPool.

A ThreadPool is an existing collection of threads for short tasks, and it is what F# users for its Async workflows. Whenever you do run an F# Async operation, it simply delegates the action to the thread pool.

The problem is, what happens when you spawn thousands of asynchronous actions in F# all at once? A naive implementation would simply spawn as many threads as needed. However, if you need 1,000 threads that means you need 1,000 x 4MB of stack space. Even if you had enough memory for all the stacks, your CPU would constantly be switching between the different threads. (And paging the local stacks in and out of memory.)

IIRC, the Windows .NET implementation was smart enough not to spawn a ton of threads and simply queue the work up until there were some spare threads to perform the actions. In other words, it would keep adding threads until it had a fixed number and just use those. However, I don't know how Mono's thread pool is implemented.

tl;dr: This is working as expected.

回答2:

Chris is probably right. The other angle to consider is that filesystems are not fixed things -- are those directories with thousands of files changing as you are trying to process the list? If so, that could be causing a race condition somewhere.

来源：https://stackoverflow.com/questions/4551133/help-me-reason-about-f-threads

标签

multithreading

asynchronous

mono