Count the number of lines in a file without reading entire file into memory?

前端 未结 15 1429
忘掉有多难
忘掉有多难 2020-12-24 01:38

I\'m processing huge data files (millions of lines each).

Before I start processing I\'d like to get a count of the number of lines in the file, so I can then indic

15条回答
  •  感动是毒
    2020-12-24 02:09

    The test results for more than 135k lines are shown below. This is my benchmark code.

     file_name = '100m.csv'
     Benchmark.bm do |x|
       x.report { File.new(file_name).readlines.size }
       x.report { `wc -l "#{file_name}"`.strip.split(' ')[0].to_i }
       x.report { File.read(file_name).scan(/\n/).count }
     end
    

    result is

       user     system      total        real
     0.100000   0.040000   0.140000 (  0.143636)
     0.000000   0.000000   0.090000 (  0.093293)
     0.380000   0.060000   0.440000 (  0.464925)
    

    The wc -l code has one problem. If there is only one line in the file and the last character does not end with \n, then count is zero.

    So, I recommend calling wc when you count more then one line.

提交回复
热议问题