TPL Dataflow Speedup?

前端 未结 4 1694
难免孤独
难免孤独 2021-01-01 02:55

I wonder whether the following code can be optimized to execute faster. I currently seem to max out at around 1.4 million simple messages per second on a pretty simple data

相关标签:
4条回答
  • 2021-01-01 03:18

    If your workload is so granular that you expect to process millions of messages per second, then passing individual messages through the pipeline becomes not viable because of the associated overhead. You'll need to chunkify the workload by batching the messages to arrays or lists. For example:

    var transform = new TransformBlock<int[], string[]>(batch =>
    {
        var results = new string[batch.Length];
        for (int i = 0; i < batch.Length; i++)
        {
            results[i] = ProcessItem(batch[i]);
        }
        return results;
    });
    

    For batching your input you could use a BatchBlock, or the "linqy" Buffer extension method from the System.Interactive package, or the similar in functionality Batch method from the MoreLinq package, or do it manually.

    0 讨论(0)
  • 2021-01-01 03:33

    You can also increase the degrees of parallelism for dataflow blocks. This may offer an additional speedup and can also help with load balancing between linear tasks if you find one of your blocks acts as a bottleneck to the rest.

    0 讨论(0)
  • 2021-01-01 03:42

    I think this mostly comes down to one thing: your test is pretty much meaningless. All those blocks are supposed to do something, and use multiple cores and asynchronous operations to do that.

    Also, in your test, it's likely that a lot of time is spent on synchronization. With a more realistic code, the code will take some time to execute, so there will be less contention, so the actual overhead will be smaller than what you measured.

    But to actually answer your question, yes, you're overlooking some performance tweaks. Specifically, SingleProducerConstrained, which means data structures with less locking can be used. If I use this on both blocks (the BufferBlock is completely useless here, you can safely remove it), the rate raises from about 3–4 millions of items per second to more than 5 millions on my computer.

    0 讨论(0)
  • 2021-01-01 03:44

    To add to svick's answer, the test uses only a single processing thread for a single action block. This way it tests nothing more than the overhead of using the blocks.

    DataFlow works in a manner similar to F# Agents, Scala actors and MPI implementations. Each action block executes a single task at a time, listening to input and producing output. Speedup is provided by breaking an algorithm in steps that can be executed independently on multiple cores, passing only messages to each other.

    While you can increase the number of concurrent tasks, the most important issue is designing a flow that perform the maximum amount of steps independently of the others.

    0 讨论(0)
提交回复
热议问题