Make an IObservable subscription concurrent

99封情书 提交于 2021-02-08 08:28:24

问题


I have the following code

string dataDirectory = _settingsProvider.DataSettings.BaseDirectory;
_solverManagementService.MergedPointCloudProducer(dataDirectory, cancellationToken)
    .Subscribe(PointCloudMergerCompleted);

where the SolverManagementService _solverManagementService is

Public class SolverManagementService : ISolverManagementService
{
    public IObservable<IPointCloud> MergedPointCloudProducer(string dataDirectory,
        CancellationToken token)
    {
        return Observable.Create<IPointCloud>(
            observer =>
            {
                PairCollectionProducer(dataDirectory, token)
                    .Subscribe(pairCollection =>
                    {
                        observer.OnNext(_icpBatchSolverService.RecursivelyMergeAsync(
                            pairCollection, token));
                    },
                    onCompleted: () =>
                    {
                        observer.OnCompleted();
                    });
                return () => { };
            });
    }
    ... // Other methods. 
}

But here _icpBatchSolverService.RecursivelyMergeAsync(pairCollection, token) is expensive and although this returns a Task<IPointCloud> I do not threadify this and this call blocks. As RecursivelyMergeAsync returns a Task<IPointCloud> it can be awaited, so I have amended the code to use async/await

public IObservable<IPointCloud> MergedPointCloudProducer(string dataDirectory,
    CancellationToken token)
{
    return Observable.Create<IPointCloud>(
        observer =>
        {
            PairCollectionProducer(dataDirectory, token)
                .Subscribe(async (pairCollection) =>
                {
                    observer.OnNext(await _icpBatchSolverService.RecursivelyMergeAsync(
                        pairCollection, token));
                },
                onCompleted: () =>
                {
                    observer.OnCompleted();
                });
            return () => { };
        });
}

but now it returns immediately and the console app shuts down. I am sure this can be done without the need for Semephores, but I am new to RX. How can I configure the RecursivelyMergeAsync to be run concurrently for each returned pairCollection without blocking and getting a notification when all recursive merges complete?

Note. In a unit test, I do the following

public class IcpBatchSolverServiceTests
{
    private Mock<ISettingsProvider> _mockSettingsProvider; 
    private IIcpBatchSolverService _icpBatchSolverService;

    [OneTimeSetUp]
    public void Setup()
    {
        _mockSettingsProvider = new Mock<ISettingsProvider>();

        _mockSettingsProvider.Setup(m => m.IcpSolverSettings).Returns(new IcpSolverSettings());
        _mockSettingsProvider.Object.IcpSolverSettings.MaximumDegreeOfParallelism = 6;

        Log.Logger = new LoggerConfiguration()
            .WriteTo.Console()
            .CreateLogger();

        var serviceProvider = new ServiceCollection()
            .AddLogging(builder =>
            {
                builder.SetMinimumLevel(LogLevel.Trace);
                builder.AddSerilog(Log.Logger);
            })
            .BuildServiceProvider();

        ILogger<IcpBatchSolverServiceTests> logger = serviceProvider
            .GetService<ILoggerFactory>()
            .CreateLogger<IcpBatchSolverServiceTests>();

        _icpBatchSolverService = new IcpBatchSolverService(_mockSettingsProvider.Object, logger);
    }

    [Test]
    public async Task CanSolveBatchAsync()
    {
        IPointCloud @static = PointCloudFactory.GetRandomPointCloud(1000);
        List<IPointCloud> pointCloudList = PointCloudFactory.GenerateRandomlyRotatedBatch(@static, 12);

        IPartitioningService<IPointCloud> ps = new PointCloudPartitioningService();
        IPointCloud result = await _icpBatchSolverService.RecursivelyMergeAsync(ps.Partition(pointCloudList), CancellationToken.None);

        Assert.AreEqual(@static.Vertices.Length, result.Vertices.Length);
    }
}

And this processes concurrently perfectly.


Edit. Outline of what processing I need to do when provided a folder of files for different geometries (depth maps for different geometries at different angles) with naming convention .NNNN.exr where NNNN is some numeric value. For a batch of files.

  1. Batch these files into collections using file name for the different geometries.

foreach file batch

  1. [*Serial*] Call C++ API to extract DepthMaps from image files.
  2. [*Parallel*] Convert DepthMaps to PointClouds. this can be done all at once.
  3. [*Parallel*] Merge PointClouds using ICP algorithm (expensive) but limit concurrency with TaskScheduler to two threads (chosen depending on machine architecture/memory etc.)

At the end of this I make another call to C++ API with the merged point cloud from step 3. So in RX my current full pipeline looks like

public class SolverManagementService : ISolverManagementService
{
    private readonly IIcpBatchSolverService _icpBatchSolverService;
    private readonly IDepthMapToPointCloudAdapter _pointCloudAdapter;
    private readonly ILogger<SolverManagementService> _logger;

    public SolverManagementService(
        IIcpBatchSolverService icpBatchSolverService,
        IDepthMapToPointCloudAdapter pointCloudAdapter,
        ILogger<SolverManagementService> logger)
    {
        _icpBatchSolverService = icpBatchSolverService ?? throw new ArgumentNullException("icpBatchSolverService cannot be null");
        _pointCloudAdapter = pointCloudAdapter ?? throw new ArgumentNullException("pointCloudAdapter cannot be null");
        _logger = logger; 
    }

    public IObservable<IPointCloud> MergedPointCloudProducer(string dataDirectory, CancellationToken token)
    {
        return Observable.Create<IPointCloud>(
            observer =>
            {
                PairCollectionProducer(dataDirectory, token)
                    .Subscribe(pairCollection =>
                    {
                        observer.OnNext(_icpBatchSolverService.RecursivelyMergeAsync(pairCollection, token).Result);
                    },
                    onCompleted: () =>
                    {
                        observer.OnCompleted();
                    });
                return () => { };
            });
    }

    public IObservable<PairCollection<IPointCloud>> PairCollectionProducer(string dataDirectory, CancellationToken token)
    {
        return Observable.Create<PairCollection<IPointCloud>>(
            observer =>
            {
                Parallel.ForEach(
                    Utils.GetFileBatches(dataDirectory), 
                    (fileBatch) =>
                {
                    var producer = RawDepthMapProducer(fileBatch, token);
                    ConcurrentBag<IPointCloud> bag = new ConcurrentBag<IPointCloud>();

                    producer.Subscribe(rawDepthMap =>
                    {
                        bag.Add(_pointCloudAdapter.GetPointCloudFromDepthMap(rawDepthMap));
                        _logger?.LogDebug($"Thread {Thread.CurrentThread.ManagedThreadId}: {bag.Count:N0} PointCloud(s) added to concurrent bag");
                    }, 
                    onCompleted: () =>
                    {
                        PointCloudPartitioningService ps = new PointCloudPartitioningService();
                        observer.OnNext(ps.Partition(bag.ToList()));

                        _logger?.LogDebug($"Thread {Thread.CurrentThread.ManagedThreadId}: PointCloud PairCollection generated " +
                            $"for file set \"{Path.GetFileNameWithoutExtension(bag.FirstOrDefault().Source)}\"");
                    });
                });
                observer.OnCompleted();
                return () => { };
            });
    }

    public IObservable<RawDepthMap> RawDepthMapProducer(List<string> filePaths, CancellationToken token)
    {
        return Observable.Create<RawDepthMap>(
            observer =>
            {
                int index = 0;
                foreach(var filePath in filePaths)
                {
                    token.ThrowIfCancellationRequested();
                    var extractor = DepthMapExtractorFactory.GetDepthMapExtractor(filePath);

                    observer.OnNext(extractor.GetDepthMap(filePath, index++));
                    _logger?.LogDebug($"Thread {Thread.CurrentThread.ManagedThreadId}: DepthMap extracted from \"{filePath}\"");
                }
                observer.OnCompleted();
                return () => { };
            });
    }
}

I am seeking: 1. What is wrong with my code above note _icpBatchSolverService.RecursivelyMergeAsync returns a Task<IPointCloud and is concurrent and I would like this tow run concurrently. 2. What else is wrong with my code?


回答1:


I'm going to leave a generic answer, because the code up above is too extensive to boil it down.

There are two syntaxes which may be used to define asynchronous behavior. The first is the async/await pattern and the second, and older, is the Subscribe() pattern (reactive).

Is asynchronous the same thing as concurrent?

No, it is definitely not. For those who might be reading this who don't know, asynchronous means "it happens later," not "it happens concurrently." By using either of these syntaxes, you're defining behavior that happens immediately after some predicate has been met. A very common use case is to handle a response coming back from a web server. You need to make the request, then do something when the response comes back.

Concurrency is different. You might invoke concurrency by using Task.Run() or Parallel.ForEach(), for example. In both cases, you're defining a fork. In the case of Task.Run, you might then later do a Task.WaitAll. In the case of the Parallel.ForEach, it will do the fork/join for you. Of course, reactive has its own set of fork/join operations.

What happens when I await or subscribe?

The following two lines of code both have the same behavior, and that behavior confuses a good number of programmers:

var result = await myAsync();

myObservable.Subscribe(result => { ... });

In both cases, the control flow of the program moves in a predictable, but potentially-confusing fashion. In the first case, control flow returns back to the parent caller while the await is being awaited. In the second, control flow moves on to the next line of code, with the lambda expression being called upon the return of the result.

A common thing that I've seen among people learning how to use these is to try to assign a variable from within the lambda to an address in the parent scope. This isn't going to work, because that scope will cease to exist long before the lambda is executed. It's less likely to do something stupid using async/await, but you also have to remember that the control flow will go up the call stack until the next synchronous operation is defined. This article explains it in a little more depth, and this article is a little easier to understand.



来源:https://stackoverflow.com/questions/59743240/make-an-iobservable-subscription-concurrent

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!