Is StreamReader.Readline() really the fastest method to count lines in a file?

前端 未结 7 1234
逝去的感伤
逝去的感伤 2020-12-30 06:23

While looking around for a while I found quite a few discussions on how to figure out the number of lines in a file.

For example these three:
c# how do I count l

7条回答
  •  借酒劲吻你
    2020-12-30 06:57

    StreamReader is not the fastest way to read files in general because of the small overhead from encoding the bytes to characters, so reading the file in a byte array is faster.
    The results I get are a bit different each time due to caching and other processes, but here is one of the results I got (in milliseconds) with a 16 MB file :

    75 ReadLines 
    82 ReadLine 
    22 ReadAllBytes 
    23 Read 32K 
    21 Read 64K 
    27 Read 128K 
    

    In general File.ReadLines should be a little bit slower than a StreamReader.ReadLine loop. File.ReadAllBytes is slower with bigger files and will throw out of memory exception with huge files. The default buffer size for FileStream is 4K, but on my machine 64K seemed the fastest.

        private static int countWithReadLines(string filePath)
        {
            int count = 0;
            var lines = File.ReadLines(filePath);
    
            foreach (var line in lines) count++;
            return count;
        }
    
        private static int countWithReadLine(string filePath)
        {
            int count = 0;
            using (var sr = new StreamReader(filePath))      
                while (sr.ReadLine() != null)
                    count++;
            return count;
        }
    
        private static int countWithFileStream(string filePath, int bufferSize = 1024 * 4)
        {
            using (var fs = new FileStream(filePath, FileMode.Open, FileAccess.Read))
            {
                int count = 0;
                byte[] array = new byte[bufferSize];
    
                while (true)
                {
                    int length = fs.Read(array, 0, bufferSize);
    
                    for (int i = 0; i < length; i++)
                        if(array[i] == 10)
                            count++;
    
                    if (length < bufferSize) return count;
                }
            } // end of using
        }
    

    and tested with:

    var path = "1234567890.txt"; Stopwatch sw; string s = "";
    File.WriteAllLines(path, Enumerable.Repeat("1234567890abcd", 1024 * 1024 )); // 16MB (16 bytes per line)
    
    sw = Stopwatch.StartNew(); countWithReadLines(path)   ; sw.Stop(); s += sw.ElapsedMilliseconds + " ReadLines \n";
    sw = Stopwatch.StartNew(); countWithReadLine(path)    ; sw.Stop(); s += sw.ElapsedMilliseconds + " ReadLine \n";
    sw = Stopwatch.StartNew(); countWithReadAllBytes(path); sw.Stop(); s += sw.ElapsedMilliseconds + " ReadAllBytes \n";
    
    sw = Stopwatch.StartNew(); countWithFileStream(path, 1024 * 32); sw.Stop(); s += sw.ElapsedMilliseconds + " Read 32K \n";
    sw = Stopwatch.StartNew(); countWithFileStream(path, 1024 * 64); sw.Stop(); s += sw.ElapsedMilliseconds + " Read 64K \n";
    sw = Stopwatch.StartNew(); countWithFileStream(path, 1024 *128); sw.Stop(); s += sw.ElapsedMilliseconds + " Read 128K \n";
    
    MessageBox.Show(s);
    

提交回复
热议问题