I have a TransformManyBlock
with the following design:
It seems that to create an output-bounded TransformManyBlock, three internal blocks are needed:
TransformBlock
that receives the input and produces IEnumerable
s, running potentially in parallel.ActionBlock
that enumerates the produced IEnumerable
s, and propagates the final results. BufferBlock
where the final results are stored, respecting the desirable BoundedCapacity
.The slightly tricky part is how to propagate the completion of the second block, because it is not directly linked to the third block. In the implementation below, the method PropagateCompletion
is written according to the source code of the library.
public static IPropagatorBlock<TInput, TOutput>
CreateOutputBoundedTransformManyBlock<TInput, TOutput>(
Func<TInput, Task<IEnumerable<TOutput>>> transform,
ExecutionDataflowBlockOptions dataflowBlockOptions)
{
if (transform == null) throw new ArgumentNullException(nameof(transform));
if (dataflowBlockOptions == null)
throw new ArgumentNullException(nameof(dataflowBlockOptions));
var input = new TransformBlock<TInput, IEnumerable<TOutput>>(transform,
dataflowBlockOptions);
var output = new BufferBlock<TOutput>(dataflowBlockOptions);
var middle = new ActionBlock<IEnumerable<TOutput>>(async results =>
{
if (results == null) return;
foreach (var result in results)
{
var accepted = await output.SendAsync(result).ConfigureAwait(false);
if (!accepted) break; // If one is rejected, the rest will be rejected too
}
}, new ExecutionDataflowBlockOptions()
{
MaxDegreeOfParallelism = 1,
BoundedCapacity = dataflowBlockOptions.MaxDegreeOfParallelism,
CancellationToken = dataflowBlockOptions.CancellationToken,
SingleProducerConstrained = true,
});
input.LinkTo(middle, new DataflowLinkOptions() { PropagateCompletion = true });
PropagateCompletion(middle, output);
return DataflowBlock.Encapsulate(input, output);
async void PropagateCompletion(IDataflowBlock source, IDataflowBlock target)
{
try
{
await source.Completion.ConfigureAwait(false);
}
catch { }
var exception = source.Completion.IsFaulted ? source.Completion.Exception : null;
if (exception != null) target.Fault(exception); else target.Complete();
}
}
// Overload with synchronous delegate
public static IPropagatorBlock<TInput, TOutput>
CreateOutputBoundedTransformManyBlock<TInput, TOutput>(
Func<TInput, IEnumerable<TOutput>> transform,
ExecutionDataflowBlockOptions dataflowBlockOptions)
{
return CreateOutputBoundedTransformManyBlock<TInput, TOutput>(
item => Task.FromResult(transform(item)), dataflowBlockOptions);
}
Usage example:
var firstBlock = CreateOutputBoundedTransformManyBlock<char, string>(
c => GetSequence(c), options);
If output ratio of the pipeline is lower then the post ratio, messages will accumulate on the pipeline until memory runs out or some queue limit is reached. If messages have a significant size, process will be starving for memory soon.
Setting BoundedCapacity
to 1 will cause messages to be rejected by queue if the queue has already one message. That is not the desired behavior in cases like batch processing, for example. Check this post for insights.
This working test illustrate my point:
//Change BoundedCapacity to +1 to see it fail
[TestMethod]
public void stackOverflow()
{
var total = 1000;
var processed = 0;
var block = new ActionBlock<int>(
(messageUnit) =>
{
Thread.Sleep(10);
Trace.WriteLine($"{messageUnit}");
processed++;
},
new ExecutionDataflowBlockOptions() { BoundedCapacity = -1 }
);
for (int i = 0; i < total; i++)
{
var result = block.SendAsync(i);
Assert.IsTrue(result.IsCompleted, $"failed for {i}");
}
block.Complete();
block.Completion.Wait();
Assert.AreEqual(total, processed);
}
So my approach is to throttle the post, so the pipeline will not accumulate much messages in the queues.
Below a simple way to do it. This way dataflow keeps processing the messages at full speed, but messages are not accumulated, and by doing this avoiding excessive memory consumption.
//Should be adjusted for specific use.
public void postAssync(Message message)
{
while (totalPending = block1.InputCount + ... + blockn.InputCount> 100)
{
Thread.Sleep(200);
//Note: if allocating huge quantities for of memory for each message the Garbage collector may keep up with the pace.
//This is the perfect place to force garbage collector to release memory.
}
block1.SendAssync(message)
}
You seem to misunderstand how TPL Dataflow works.
BoundedCapacity
limits the amount of items you can post into a block. In your case that means a single char
into the TransformManyBlock
and single string
into the ActionBlock
.
So you post a single item to the TransformManyBlock
which then returns 1024*1024
strings and tries to pass them on to the ActionBlock
which will only accept a single one at a time. The rest of the strings will just sit there in the TransformManyBlock
's output queue.
What you probably want to do is create a single block and post items into it in a streaming fashion by waiting (synchronously or otherwise) when it's capacity is reached:
private static void Main()
{
MainAsync().Wait();
}
private static async Task MainAsync()
{
var block = new ActionBlock<string>(async item =>
{
Console.WriteLine(item.Substring(0, 10));
await Task.Delay(1000);
}, new ExecutionDataflowBlockOptions { BoundedCapacity = 1 });
foreach (var item in GetSequence('A'))
{
await block.SendAsync(item);
}
block.Complete();
await block.Completion;
}