Count the number of lines in a file without reading entire file into memory?

前端未结

关注

 15  1429

忘掉有多难 2020-12-24 01:38

I\'m processing huge data files (millions of lines each).

Before I start processing I\'d like to get a count of the number of lines in the file, so I can then indic

15条回答

感动是毒 (楼主)

2020-12-24 02:09

The test results for more than 135k lines are shown below. This is my benchmark code.

 file_name = '100m.csv'
 Benchmark.bm do |x|
   x.report { File.new(file_name).readlines.size }
   x.report { `wc -l "#{file_name}"`.strip.split(' ')[0].to_i }
   x.report { File.read(file_name).scan(/\n/).count }
 end

result is

   user     system      total        real
 0.100000   0.040000   0.140000 (  0.143636)
 0.000000   0.000000   0.090000 (  0.093293)
 0.380000   0.060000   0.440000 (  0.464925)

The wc -l code has one problem. If there is only one line in the file and the last character does not end with \n, then count is zero.

So, I recommend calling wc when you count more then one line.

0 讨论(0)

查看其它15个回答