C# Parallel.ForEach() memory usage keeps growing

前端 未结 2 1582
情歌与酒
情歌与酒 2020-12-12 06:58
public string SavePath { get; set; } = @\"I:\\files\\\";

public void DownloadList(List list)
{
    var rest = ExcludeDownloaded(list);
    var result          


        
相关标签:
2条回答
  • The Parallel.ForEach method is intended for parallelizing CPU-bound workloads. Downloading a file is an I/O bound workload, and so the Parallel.ForEach is not ideal for this case because it needlessly blocks ThreadPool threads. The correct way to do it is asynchronously, with async/await. The recommended class for making asynchronous web requests is the HttpClient, and for controlling the level of concurrency an excellent option is the TPL Dataflow library. For this case it is enough to use the simplest component of this library, the ActionBlock class:

    async Task DownloadListAsync(List<string> list)
    {
        using (var httpClient = new HttpClient())
        {
            var rest = ExcludeDownloaded(list);
            var block = new ActionBlock<string>(async link =>
            {
                await DownloadFileAsync(httpClient, link);
            }, new ExecutionDataflowBlockOptions()
            {
                MaxDegreeOfParallelism = 10
            });
            foreach (var link in rest)
            {
                await block.SendAsync(link);
            }
            block.Complete();
            await block.Completion;
        }
    }
    
    async Task DownloadFileAsync(HttpClient httpClient, string link)
    {
        var fileName = Guid.NewGuid().ToString(); // code to generate unique fileName;
        var filePath = Path.Combine(SavePath, fileName);
        if (File.Exists(filePath)) return;
        var response = await httpClient.GetAsync(link);
        using (var contentStream = await response.Content.ReadAsStreamAsync())
        using (var fileStream = new FileStream(filePath, FileMode.Create,
            FileAccess.Write, FileShare.None, 32768, FileOptions.Asynchronous))
        {
            await contentStream.CopyToAsync(fileStream);
        }
    }
    

    The code for downloading a file with HttpClient is not as simple as the WebClient.DownloadFile(), but it's what you have to do in order to keep the whole process asynchronous (both reading from the web and writing to the disk).


    Caveat: Asynchronous filesystem operations are currently not implemented efficiently in .NET. For maximum efficiency it may be preferable to avoid using the FileOptions.Asynchronous option in the FileStream constructor.

    0 讨论(0)
  • 2020-12-12 07:35

    Use WebClient.DownloadFile() to download directly to a file so you don't have the whole file in memory.

    0 讨论(0)
提交回复
热议问题