// let\'s say there is a list of 1000+ URLs
string[] urls = { \"http://google.com\", \"http://yahoo.com\", ... };
// now let\'s send HTTP requests to each of these
Here is a solution that takes advantage of the lazy nature of LINQ. It is functionally equivalent to the accepted answer), but uses worker-tasks instead of a SemaphoreSlim
, reducing this way the memory footprint of the whole operation. At first lets make it work without throttling. The first step is to convert our urls to an enumerable of tasks.
string[] urls =
{
"https://stackoverflow.com",
"https://superuser.com",
"https://serverfault.com",
"https://meta.stackexchange.com",
// ...
};
var httpClient = new HttpClient();
var tasks = urls.Select(async (url) =>
{
return (Url: url, Html: await httpClient.GetStringAsync(url));
});
The second step is to await
all tasks concurrently using the Task.WhenAll method:
var results = await Task.WhenAll(tasks);
foreach (var result in results)
{
Console.WriteLine($"Url: {result.Url}, {result.Html.Length:#,0} chars");
}
Output:
Url: https://stackoverflow.com, 105.574 chars
Url: https://superuser.com, 126.953 chars
Url: https://serverfault.com, 125.963 chars
Url: https://meta.stackexchange.com, 185.276 chars
...
Microsoft's implementation of Task.WhenAll
materializes instantly the supplied enumerable to an array, causing all tasks to starts at once. We don't want that, because we want to limit the number of concurrent asynchronous operations. So we'll need to implement an alternative WhenAll
that will enumerate our enumerable gently and slowly. We will do that by creating a number of worker-tasks (equal to the desired level of concurrency), and each worker-task will enumerate our enumerable one task at a time, using a lock to ensure that each url-task will be processed by only one worker-task. Then we await
for all worker-tasks to complete, and finally we return the results. Here is the implementation:
public static async Task WhenAll(IEnumerable> tasks,
int concurrencyLevel)
{
if (tasks is ICollection>) throw new ArgumentException(
"The enumerable should not be materialized.", nameof(tasks));
var locker = new object();
var results = new List();
var failed = false;
using (var enumerator = tasks.GetEnumerator())
{
var workerTasks = Enumerable.Range(0, concurrencyLevel)
.Select(async _ =>
{
try
{
while (true)
{
Task task;
int index;
lock (locker)
{
if (failed) break;
if (!enumerator.MoveNext()) break;
task = enumerator.Current;
index = results.Count;
results.Add(default); // Reserve space in the list
}
var result = await task.ConfigureAwait(false);
lock (locker) results[index] = result;
}
}
catch (Exception)
{
lock (locker) failed = true;
throw;
}
}).ToArray();
await Task.WhenAll(workerTasks).ConfigureAwait(false);
}
lock (locker) return results.ToArray();
}
...and here is what we must change in our initial code, to achieve the desired throttling:
var results = await WhenAll(tasks, concurrencyLevel: 2);
There is a difference regarding the handling of the exceptions. The native Task.WhenAll
waits for all tasks to complete and aggregates all the exceptions. The implementation above terminates promptly after the completion of the first faulted task.