How to mark a TPL dataflow cycle to complete?

空扰寡人 提交于 2019-12-03 16:53:04

If the purpose of your code is to traverse the directory structure using some sort of parallelism then I would suggest not using TPL Dataflow and use Microsoft's Reactive Framework instead. I think it becomes much simpler.

Here's how I would do it.

First define a recursive function to build the list of directories:

Func<DirectoryInfo, IObservable<DirectoryInfo>> recurse = null;
recurse = di =>
    Observable
        .Return(di)
        .Concat(di.GetDirectories()
            .ToObservable()
            .SelectMany(di2 => recurse(di2)))
        .ObserveOn(Scheduler.Default);

This performs the recurse of the directories and uses the default Rx scheduler which causes the observable to run in parallel.

So by calling recurse with an input DirectoryInfo I get an observable list of the input directory and all of its descendants.

Now I can build a fairly straight-forward query to get the results I want:

var query =
    from di in recurse(new DirectoryInfo(@"C:\dev\kortforsyningen_dsm\tiles"))
    from fi in di.GetFiles().ToObservable()
    let zxy =
        fi
            .FullName
            .Split('\\')
            .Reverse()
            .Take(3)
            .Reverse()
            .Select(s => int.Parse(Path.GetFileNameWithoutExtension(s)))
            .ToArray()
    let suffix = String.Format("{0}/{1}/{2}.png", zxy[0], zxy[1], zxy[2])
    select new FileInfo(Path.Combine(di.FullName, suffix));

Now I can action the query like this:

query
    .Subscribe(s =>
    {
        Trace.TraceInformation("Done combining : {0}", s.Name);
    });

Now I may have missed a little bit in your custom code but if this is an approach you want to take I'm sure you can fix any logical issues quite easily.

This code automatically handles completion when it runs out of child directories and files.

To add Rx to your project look for "Rx-Main" in NuGet.

I don't see any way this can be done, because each block (dirBroadcast and tileFilder) depends on the other one and can't complete on its own.

I suggest you redesign your directory traversal without TPL Dataflow, which isn't a good fit for this kind of problem. A better approach in my opinion would simply be to recursively scan the directories and fill your block with a stream of files:

private static void FillBlock(DirectoryInfo directoryInfo, XYZTileCombinerBlock<FileInfo> block)
{
    foreach (var fileInfo in directoryInfo.GetFiles())
    {
        block.Post(fileInfo);
    }

    foreach (var subDirectory in directoryInfo.GetDirectories())
    {
        FillBlock(subDirectory, block);
    }
}

FillBlock(directory, block);
block.Complete();
await block.Completion;

I am sure this is not always possible, but in many cases (including directory enumeration) you can use a running counter and the Interlocked functions to have a cyclic one-to-many dataflow that completes:

public static ISourceBlock<string> GetDirectoryEnumeratorBlock(string path, int maxParallel = 5)
{
    var outputBuffer = new BufferBlock<string>();

    var count = 1;

    var broadcastBlock = new BroadcastBlock<string>(s => s);

    var getDirectoriesBlock = new TransformManyBlock<string, string>(d =>
    {
        var files = Directory.EnumerateDirectories(d).ToList();

        Interlocked.Add(ref count, files.Count - 1); //Adds the subdir count, minus 1 for the current directory.

        if (count == 0) //if count reaches 0 then all directories have been enumerated.
            broadcastBlock.Complete();

        return files;

    }, new ExecutionDataflowBlockOptions() { MaxDegreeOfParallelism = maxParallel });

    broadcastBlock.LinkTo(outputBuffer, new DataflowLinkOptions() { PropagateCompletion = true });
    broadcastBlock.LinkTo(getDirectoriesBlock, new DataflowLinkOptions() { PropagateCompletion = true });

    getDirectoriesBlock.LinkTo(broadcastBlock);

    getDirectoriesBlock.Post(path);

    return outputBuffer;
}

I have used this with a slight modification to enumerate files, but it works well. Be careful with the max degree of parallelism, this can quickly saturate a network file system!

Just to show my real answer, a combination of TPL and Rx.

            Func<DirectoryInfo, IObservable<DirectoryInfo>> recurse = null;
            recurse = di =>
                Observable
                    .Return(di)
                    .Concat(di.GetDirectories()
                        .Where(d => int.Parse(d.Name) <= br_tile[0] && int.Parse(d.Name) >= tl_tile[0])
                        .ToObservable()
                        .SelectMany(di2 => recurse(di2)))
                    .ObserveOn(Scheduler.Default);
            var query =
                from di in recurse(new DirectoryInfo(Path.Combine(directory.FullName, baselvl.ToString())))
                from fi in di.GetFiles().Where(f => int.Parse(Path.GetFileNameWithoutExtension(f.Name)) >= br_tile[1]
                    && int.Parse(Path.GetFileNameWithoutExtension(f.Name)) <= tl_tile[1]).ToObservable()
                select fi;
            query.Subscribe(block.AsObserver());
            Console.WriteLine("Done subscribing");
            block.Complete();

            block.Completion.Wait();
            Console.WriteLine("Done TPL Block");

where block is my var block = new XYZTileCombinerBlock<FileInfo>

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!