async file reading 40 times slower than synchronous or manual Threads

家住魔仙堡 提交于 2020-08-26 07:12:06

问题


I have 3 files, each 1 million rows long and I'm reading them line by line. No processing, just reading as I'm just trialling things out.

If I do this synchronously it takes 1 second. If I switch to using Threads, one for each file, it is slightly quicker (code not below, but I simply created a new Thread and started it for each file).

When I change to async, it is taking 40 times as long at 40 seconds. If I add in any work to do actual processing, I cannot see how I'd ever use async over synchronous or if I wanted a responsive application using Threads.

Or am I doing something fundamentally wrong with this code and not as async was intended?

Thanks.

class AsyncTestIOBound
{
    Stopwatch sw = new Stopwatch();
    internal void Tests()
    {
        DoSynchronous();
        DoASynchronous();
    }
    #region sync
    private void DoSynchronous()
    {
        sw.Restart();
        var start = sw.ElapsedMilliseconds;
        Console.WriteLine($"Starting Sync Test");
        DoSync("Addresses", "SampleLargeFile1.txt");
        DoSync("routes   ", "SampleLargeFile2.txt");
        DoSync("Equipment", "SampleLargeFile3.txt");
        sw.Stop();
        Console.WriteLine($"Ended Sync Test. Took {(sw.ElapsedMilliseconds - start)} mseconds");
        Console.ReadKey();
    }

    private long DoSync(string v, string filename)
    {
        string line;
        long counter = 0;
        using (StreamReader file = new StreamReader(filename))
        {
            while ((line = file.ReadLine()) != null)
            {
                counter++;
            }
        }
        Console.WriteLine($"{v}: T{Thread.CurrentThread.ManagedThreadId}: Lines: {counter}");
        return counter;
    }
    #endregion

    #region async
    private void DoASynchronous()
    {
        sw.Restart();
        var start = sw.ElapsedMilliseconds;
        Console.WriteLine($"Starting Sync Test");
        Task a=DoASync("Addresses", "SampleLargeFile1.txt");
        Task b=DoASync("routes   ", "SampleLargeFile2.txt");
        Task c=DoASync("Equipment", "SampleLargeFile3.txt");
        Task.WaitAll(a, b, c);
        sw.Stop();
        Console.WriteLine($"Ended Sync Test. Took {(sw.ElapsedMilliseconds - start)} mseconds");
        Console.ReadKey();
    }

    private async Task<long> DoASync(string v, string filename)
    {
        string line;
        long counter = 0;
        using (StreamReader file = new StreamReader(filename))
        {
            while ((line = await file.ReadLineAsync()) != null)
            {
                counter++;
            }
        }
        Console.WriteLine($"{v}: T{Thread.CurrentThread.ManagedThreadId}: Lines: {counter}");
        return counter;
    }
    #endregion

}

回答1:


Since you are using await several times in a giant loop (in your case, looping through each line of a "SampleLargeFile"), you are doing a lot of context switching, and the overhead can be really bad.

For each line, your code maybe is switching between each file. If your computer uses a hard drive, this can get even worse. Imagine the head of your HD getting crazy.

When you use normal threads, you are not switching the context for each line.

To solve this, just read the file on a single run. You can still use async/await (ReadToEndAsync()) and get a good performance.

EDIT

So, you are trying to count lines on the text file using async, right?

Try this (no need to load the entire file in memory):

private async Task<int> CountLines(string path)
{
    int count = 0;
    await Task.Run(() =>
    {
        using (FileStream fs = File.Open(path, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
        using (BufferedStream bs = new BufferedStream(fs))
        using (StreamReader sr = new StreamReader(bs))
        {
            while (sr.ReadLine() != null)
            {
                count++;
            }
        }
    });
    return count;
}



回答2:


a few things. First I would read all lines at once in the async method so that you are only awaiting once (instead of per line).

private async Task<long> DoASync(string v, string filename)
{
    string lines;
    long counter = 0;
    using (StreamReader file = new StreamReader(filename))
    {
        lines = await reader.ReadToEndAsync();
    }
    Console.WriteLine($"{v}: T{Thread.CurrentThread.ManagedThreadId}: Lines: {lines.Split('\n').Length}");
    return counter;
}

next, you can also wait for each Task individually. This will cause your CPU to only focus on one at a time, instead of possibly switching between the 3, which will cause more overhead.

private async void DoASynchronous()
{
    sw.Restart();
    var start = sw.ElapsedMilliseconds;
    Console.WriteLine($"Starting Sync Test");
    await DoASync("Addresses", "SampleLargeFile1.txt");
    await DoASync("routes   ", "SampleLargeFile2.txt");
    await DoASync("Equipment", "SampleLargeFile3.txt");
    sw.Stop();
    Console.WriteLine($"Ended Sync Test. Took {(sw.ElapsedMilliseconds - start)} mseconds");
    Console.ReadKey();
}

The reason why you are seeing slower performance is due to how await works with the CPU load. For each new line, this will cause an increase of CPU usage. Async machinery adds processing, allocations and synchronization. Also, we need to transition to kernel mode two times instead of once (first to initiate the IO, then to dequeue the IO completion notification).

More info, see: Does async await increases Context switching



来源:https://stackoverflow.com/questions/54753339/async-file-reading-40-times-slower-than-synchronous-or-manual-threads

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!