Fire and no-wait (without do!) vs Fire and await (do!) got huge difference performance?

问题

The following code takes about 20 seconds to run. However, it took less than a second after uncommenting the do!. Why there is such a huge difference?

Update: it takes 9 seconds when using ag.Add. I've updated the code.

open FSharpx.Control

let test () =
    let ag = new BlockingQueueAgent<int option>(500)

    let enqueue() = async { 
        for i = 1 to 500 do 
            //do! ag.AsyncAdd (Some i) // less than a second with do!
            ag.AsyncAdd (Some i)       // it takes about 20 seconds without do!
            //ag.Add (Some i)          // This one takes about 9 seconds
            //printfn "=> %d" i 
            }

    async {
        do! [ for i = 1 to 100 do yield enqueue() ] 
            |> Async.Parallel |> Async.Ignore
        for i = 1 to 5 do ag.Add None
    } |> Async.Start

    let rec dequeue() =
        async {
            let! m = ag.AsyncGet()
            match m with
            | Some v ->
                //printfn "<= %d" v
                return! dequeue()
            | None -> 
                printfn "Done" 
        }

    [ for i = 1 to 5 do yield dequeue() ] 
    |> Async.Parallel |> Async.Ignore |> Async.RunSynchronously
    0

回答1:

Continued from this question. Here is the experiment based on your code:

// Learn more about F# at http://fsharp.org
module Test.T1

open System
open System.Collections.Generic
open System.Diagnostics

type Msg<'T> = 
    | AsyncAdd of 'T * AsyncReplyChannel<unit> 
    | Add of 'T
    | AsyncGet of AsyncReplyChannel<'T>

let sw = Stopwatch()
let mutable scanned = 0
let mutable scanTimeStart = 0L
let createQueue maxLength = MailboxProcessor.Start(fun inbox -> 
    let queue = new Queue<'T>()
    let rec emptyQueue() = 
        inbox.Scan(fun msg ->
          match msg with 
          | AsyncAdd(value, reply) -> Some(enqueueAndContinueWithReply(value, reply))
          | Add(value) -> Some(enqueueAndContinue(value))
          | _ -> None )
    and fullQueue() =
        scanTimeStart <- sw.ElapsedMilliseconds 
        inbox.Scan(fun msg ->
          scanned <- scanned + 1          
          match msg with 
          | AsyncGet(reply) ->                
            Some(dequeueAndContinue(reply))
          | _ -> None )
    and runningQueue() = async {
        let! msg = inbox.Receive()
        scanTimeStart <- sw.ElapsedMilliseconds 
        match msg with 
        | AsyncAdd(value, reply) -> return! enqueueAndContinueWithReply(value, reply)
        | Add(value) -> return! enqueueAndContinue(value)
        | AsyncGet(reply) -> return! dequeueAndContinue(reply) }
    and enqueueAndContinueWithReply (value, reply) = async {
        reply.Reply() 
        queue.Enqueue(value)
        return! chooseState() }
    and enqueueAndContinue (value) = async {
        queue.Enqueue(value)
        return! chooseState() }
    and dequeueAndContinue (reply) = async {
        let timestamp = sw.ElapsedMilliseconds
        printfn "[AsyncGet] messages cnt/scanned: %d/%d, timestamp/scanTime: %d/%d" inbox.CurrentQueueLength scanned timestamp (timestamp - scanTimeStart)
        scanned <- 0
        reply.Reply(queue.Dequeue())
        return! chooseState() }
    and chooseState() = 
        if queue.Count = 0 then emptyQueue()
        elif queue.Count < maxLength then runningQueue()
        else fullQueue()    
    emptyQueue())
let mb = createQueue<int option> 500    
let addWithReply v = mb.PostAndAsyncReply(fun ch -> AsyncAdd(v, ch))
let addAndForget v = mb.Post(Add v)
let get() = mb.PostAndAsyncReply(AsyncGet) 


[<EntryPoint>]
let main args = 
    sw.Start()
    let enqueue() = async { 
        for i = 1 to 500 do 
            //do! ag.AsyncAdd (Some i) // less than a second with do!
            addWithReply (Some i)       // it takes about 20 seconds without do!
            //addAndForget(Some i)
            //ag.Add (Some i)          // This one takes about 9 seconds
            //printfn "=> %d" i 
            }

    async {
        do! [ for i = 1 to 100 do yield enqueue() ] 
            |> Async.Parallel |> Async.Ignore
        for i = 1 to 5 do addAndForget None
    } |> Async.Start

    let rec dequeue() =
        async {
            let! m = get()
            match m with
            | Some v ->
                //printfn "<= %d" v
                return! dequeue()
            | None -> 
                printfn "Done" 
        }

    [ for i = 1 to 5 do yield dequeue() ] 
    |> Async.Parallel |> Async.Ignore |> Async.RunSynchronously
    sw.Stop()
    printfn "Totally ellapsed: %dms" sw.ElapsedMilliseconds
    0

addWithReply is AsyncAdd. When we run without do! the output is (part of it):

...
[AsyncGet] messages cnt/scanned: 48453/48450, timestamp/scanTime: 3755/6
[AsyncGet] messages cnt/scanned: 48452/48449, timestamp/scanTime: 3758/3
[AsyncGet] messages cnt/scanned: 48451/48448, timestamp/scanTime: 3761/3
[AsyncGet] messages cnt/scanned: 48450/48447, timestamp/scanTime: 3764/3
...

So as you can see, without do! you basically add all 50000 enqueue requests to message queue of mailbox. Dequeue threads are slower here and put their requests only at the end of the messages. Last line of outputstates that we have 48450 message in mailbox, item queue is full (500 items) and in order to free one space we need to scan 48447 messages - because all of them are AsyncAdd, not AsyncGet. scanTime is 2-3ms (on my machine) - approximate time from MailboxProcessor.Scan.

When we add do!, the message queue has different shape (see the output):

[AsyncGet] messages cnt/scanned: 98/96, timestamp/scanTime: 1561/0
[AsyncGet] messages cnt/scanned: 96/96, timestamp/scanTime: 1561/0
[AsyncGet] messages cnt/scanned: 104/96, timestamp/scanTime: 1561/0
[AsyncGet] messages cnt/scanned: 102/96, timestamp/scanTime: 1561/0

The number of messages in message queue ~ # of enqueue threads, because each of them wait now.

What I cannot understand from the experiment yet is when you change AsyncAdd to Add, you still spam the MailboxProcessor:

[AsyncGet] messages cnt/scanned: 47551/47548, timestamp/scanTime: 3069/1
[AsyncGet] messages cnt/scanned: 47550/47547, timestamp/scanTime: 3070/1
[AsyncGet] messages cnt/scanned: 47549/47546, timestamp/scanTime: 3073/3
[AsyncGet] messages cnt/scanned: 47548/47545, timestamp/scanTime: 3077/2

but avg time spent on scan is ~1ms - faster then with AsyncReplyChannel. My thought - this is connected to how AsyncReplyChannel is implemented. It has dependency on ManualResetEvent, so internally there could be another queue of such events per process and each AsyncGet should scan this queue when AsyncReplyChannel is created.

回答2:

Without the do!, you're not awaiting the results of AsyncAdd. That means that you're kicking off five hundred AsyncAdd operations as fast as possible for each call to enqueue(). And although each AsyncAdd call will block if the queue is full, if you don't await the result of AsyncAdd then your enqueue() code won't be blocked, and it will continue to launch new AsyncAdd operations.

And since you're launching 100 enqueue() operations in parallel, that's potentially up to fifty thousand AsyncAdd operations that will be trying to run at the same time, which means 49,500 blocked threads being handled by the thread pool. That's a LOT of demand to put on your system. In practice, you won't launch 100 enqueue() operations in parallel at the same time, but you'll launch as many enqueue() operations as you have logical CPUs. For the rest of this answer, I'm going to assume that you have a quad-core processor with hyperthreading (as your F# Async.Parallel |> Async.RunSynchronously only uses one of the eight CPU core? question seems to suggest), so that's 8 logical CPUs so you'll launch eight copies of enqueue() before anything blocks, meaning you'll have 4,000 AsyncAdd threads running, 3,500 of which will be blocked.

When you use do!, on the other hand, then if AsyncAdd is blocked, your enqueue() operation will also block until there's a slot open in the queue. So once there are 500 items in the queue, instead of (8*500 - 500 = 3500) blocked AsyncAdd threads sitting in the thread pool, there will be 8 blocked AsyncAdd threads (one for each of the eight enqueue() operations running on each of your eight logical CPUs). Eight blocked threads instead of 3,500 means that the thread pool isn't making 3,500 allocations, using much less RAM and much less CPU time to process all those threads.

As I said in my answer to your previous question, it really seems like you need a deeper understanding of asynchronous operations. Besides the articles I linked to in that answer (this article and this series), I'm also going to recommend reading https://medium.com/jettech/f-async-guide-eb3c8a2d180a which is a pretty long and detailed guide to F# async operations and some of the "gotchas" you can encounter. I'd strongly suggest going and reading those articles, then coming back and looking at your questions again. With the deeper understanding you've gained from reading those articles, you just might be able to answer your own questions!

来源：https://stackoverflow.com/questions/57652524/fire-and-no-wait-without-do-vs-fire-and-await-do-got-huge-difference-perfo

标签