I wonder whether the following code can be optimized to execute faster. I currently seem to max out at around 1.4 million simple messages per second on a pretty simple data
You can also increase the degrees of parallelism for dataflow blocks. This may offer an additional speedup and can also help with load balancing between linear tasks if you find one of your blocks acts as a bottleneck to the rest.