Is StreamReader.Readline() really the fastest method to count lines in a file?

前端未结

关注

 7  1231

逝去的感伤

While looking around for a while I found quite a few discussions on how to figure out the number of lines in a file.

For example these three:
c# how do I count l

相关标签:

7条回答

离开以前

2020-12-30 06:55

Yes, reading lines like that is the fastest and easiest way in any practical sense.

There are no shortcuts here. Files are not line based, so you have to read every single byte from the file to determine how many lines there are.

As TomTom pointed out, creating the strings is not strictly needed to count the lines, but a vast majority of the time spent will be waiting for the data to be read from the disk. Writing a much more complicated algorithm would perhaps shave off a percent of the execution time, and it would dramatically increase the time for writing and testing the code.

0 讨论(0)
发布评论:

提交评论
- 加载中...

借酒劲吻你

2020-12-30 06:57

StreamReader is not the fastest way to read files in general because of the small overhead from encoding the bytes to characters, so reading the file in a byte array is faster.
The results I get are a bit different each time due to caching and other processes, but here is one of the results I got (in milliseconds) with a 16 MB file :

75 ReadLines 
82 ReadLine 
22 ReadAllBytes 
23 Read 32K 
21 Read 64K 
27 Read 128K

In general File.ReadLines should be a little bit slower than a StreamReader.ReadLine loop. File.ReadAllBytes is slower with bigger files and will throw out of memory exception with huge files. The default buffer size for FileStream is 4K, but on my machine 64K seemed the fastest.

    private static int countWithReadLines(string filePath)
    {
        int count = 0;
        var lines = File.ReadLines(filePath);

        foreach (var line in lines) count++;
        return count;
    }

    private static int countWithReadLine(string filePath)
    {
        int count = 0;
        using (var sr = new StreamReader(filePath))      
            while (sr.ReadLine() != null)
                count++;
        return count;
    }

    private static int countWithFileStream(string filePath, int bufferSize = 1024 * 4)
    {
        using (var fs = new FileStream(filePath, FileMode.Open, FileAccess.Read))
        {
            int count = 0;
            byte[] array = new byte[bufferSize];

            while (true)
            {
                int length = fs.Read(array, 0, bufferSize);

                for (int i = 0; i < length; i++)
                    if(array[i] == 10)
                        count++;

                if (length < bufferSize) return count;
            }
        } // end of using
    }

and tested with:

var path = "1234567890.txt"; Stopwatch sw; string s = "";
File.WriteAllLines(path, Enumerable.Repeat("1234567890abcd", 1024 * 1024 )); // 16MB (16 bytes per line)

sw = Stopwatch.StartNew(); countWithReadLines(path)   ; sw.Stop(); s += sw.ElapsedMilliseconds + " ReadLines \n";
sw = Stopwatch.StartNew(); countWithReadLine(path)    ; sw.Stop(); s += sw.ElapsedMilliseconds + " ReadLine \n";
sw = Stopwatch.StartNew(); countWithReadAllBytes(path); sw.Stop(); s += sw.ElapsedMilliseconds + " ReadAllBytes \n";

sw = Stopwatch.StartNew(); countWithFileStream(path, 1024 * 32); sw.Stop(); s += sw.ElapsedMilliseconds + " Read 32K \n";
sw = Stopwatch.StartNew(); countWithFileStream(path, 1024 * 64); sw.Stop(); s += sw.ElapsedMilliseconds + " Read 64K \n";
sw = Stopwatch.StartNew(); countWithFileStream(path, 1024 *128); sw.Stop(); s += sw.ElapsedMilliseconds + " Read 128K \n";

MessageBox.Show(s);

0 讨论(0)

我在风中等你

2020-12-30 07:00

I tried multiple methods and tested their performance:

The one that reads a single byte is about 50% slower than the other methods. The other methods all return around the same amount of time. You could try creating threads and doing this asynchronously, so while you are waiting for a read you can start processing a previous read. That sounds like a headache to me.

I would go with the one liner: File.ReadLines(filePath).Count(); it performs as well as the other methods I tested.

        private static int countFileLines(string filePath)
        {
            using (StreamReader r = new StreamReader(filePath))
            {
                int i = 0;
                while (r.ReadLine() != null)
                {
                    i++;
                }
                return i;
            }
        }

        private static int countFileLines2(string filePath)
        {
            using (Stream s = new FileStream(filePath, FileMode.Open))
            {
                int i = 0;
                int b;

                b = s.ReadByte();
                while (b >= 0)
                {
                    if (b == 10)
                    {
                        i++;
                    }
                    b = s.ReadByte();
                }
                return i + 1;
            }
        }

        private static int countFileLines3(string filePath)
        {
            using (Stream s = new FileStream(filePath, FileMode.Open))
            {
                int i = 0;
                byte[] b = new byte[bufferSize];
                int n = 0;

                n = s.Read(b, 0, bufferSize);
                while (n > 0)
                {
                    i += countByteLines(b, n);
                    n = s.Read(b, 0, bufferSize);
                }
                return i + 1;
            }
        }

        private static int countByteLines(byte[] b, int n)
        {
            int i = 0;
            for (int j = 0; j < n; j++)
            {
                if (b[j] == 10)
                {
                    i++;
                }
            }

            return i;
        }

        private static int countFileLines4(string filePath)
        {
            return File.ReadLines(filePath).Count();
        }

0 讨论(0)

一生所求

2020-12-30 07:01
The best way to know how to do this fast is to think about the fastest way to do it without using C/C++.

In assembly there is a CPU level operation that scans memory for a character so in assembly you would do the following
- Read big part (or all) of the file into memory
- Execute the SCASB command
- Repeat as needed
So, in C# you want the compiler to get as close to that as possible.
0 讨论(0)
发布评论:

提交评论
- 加载中...

盖世英雄少女心

2020-12-30 07:13

public static int CountLines(Stream stm)
{
    StreamReader _reader = new StreamReader(stm);
    int c = 0, count = 0;
    while ((c = _reader.Read()) != -1)
    {
        if (c == '\n')
        {
            count++;
        }
    }
    return count;
}

0 讨论(0)

花落未央

2020-12-30 07:18

No, it is not. Point is - it materializes the strings, which is not needed.

To COUNT it you are much better off to ignore the "string" Part and to go the "line" Part.

a LINE is a seriees of bytes ending with \r\n (13, 10 - CR LF) or another marker.

Just run along the bytes, in a buffered stream, counting the number of appearances of your end of line marker.

0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页