How to limit the amount of concurrent async I/O operations?

前端 未结 14 2751
遇见更好的自我
遇见更好的自我 2020-11-22 01:27
// let\'s say there is a list of 1000+ URLs
string[] urls = { \"http://google.com\", \"http://yahoo.com\", ... };

// now let\'s send HTTP requests to each of these          


        
14条回答
  •  独厮守ぢ
    2020-11-22 02:00

    Here is a solution that takes advantage of the lazy nature of LINQ. It is functionally equivalent to the accepted answer), but uses worker-tasks instead of a SemaphoreSlim, reducing this way the memory footprint of the whole operation. At first lets make it work without throttling. The first step is to convert our urls to an enumerable of tasks.

    string[] urls =
    {
        "https://stackoverflow.com",
        "https://superuser.com",
        "https://serverfault.com",
        "https://meta.stackexchange.com",
        // ...
    };
    var httpClient = new HttpClient();
    var tasks = urls.Select(async (url) =>
    {
        return (Url: url, Html: await httpClient.GetStringAsync(url));
    });
    

    The second step is to await all tasks concurrently using the Task.WhenAll method:

    var results = await Task.WhenAll(tasks);
    foreach (var result in results)
    {
        Console.WriteLine($"Url: {result.Url}, {result.Html.Length:#,0} chars");
    }
    

    Output:

    Url: https://stackoverflow.com, 105.574 chars
    Url: https://superuser.com, 126.953 chars
    Url: https://serverfault.com, 125.963 chars
    Url: https://meta.stackexchange.com, 185.276 chars
    ...

    Microsoft's implementation of Task.WhenAll materializes instantly the supplied enumerable to an array, causing all tasks to starts at once. We don't want that, because we want to limit the number of concurrent asynchronous operations. So we'll need to implement an alternative WhenAll that will enumerate our enumerable gently and slowly. We will do that by creating a number of worker-tasks (equal to the desired level of concurrency), and each worker-task will enumerate our enumerable one task at a time, using a lock to ensure that each url-task will be processed by only one worker-task. Then we await for all worker-tasks to complete, and finally we return the results. Here is the implementation:

    public static async Task WhenAll(IEnumerable> tasks,
        int concurrencyLevel)
    {
        if (tasks is ICollection>) throw new ArgumentException(
            "The enumerable should not be materialized.", nameof(tasks));
        var locker = new object();
        var results = new List();
        var failed = false;
        using (var enumerator = tasks.GetEnumerator())
        {
            var workerTasks = Enumerable.Range(0, concurrencyLevel)
            .Select(async _ =>
            {
                try
                {
                    while (true)
                    {
                        Task task;
                        int index;
                        lock (locker)
                        {
                            if (failed) break;
                            if (!enumerator.MoveNext()) break;
                            task = enumerator.Current;
                            index = results.Count;
                            results.Add(default); // Reserve space in the list
                        }
                        var result = await task.ConfigureAwait(false);
                        lock (locker) results[index] = result;
                    }
                }
                catch (Exception)
                {
                    lock (locker) failed = true;
                    throw;
                }
            }).ToArray();
            await Task.WhenAll(workerTasks).ConfigureAwait(false);
        }
        lock (locker) return results.ToArray();
    }
    

    ...and here is what we must change in our initial code, to achieve the desired throttling:

    var results = await WhenAll(tasks, concurrencyLevel: 2);
    

    There is a difference regarding the handling of the exceptions. The native Task.WhenAll waits for all tasks to complete and aggregates all the exceptions. The implementation above terminates promptly after the completion of the first faulted task.

提交回复
热议问题