I\'m processing huge data files (millions of lines each).
Before I start processing I\'d like to get a count of the number of lines in the file, so I can then indic
The test results for more than 135k lines are shown below. This is my benchmark code.
file_name = '100m.csv'
Benchmark.bm do |x|
x.report { File.new(file_name).readlines.size }
x.report { `wc -l "#{file_name}"`.strip.split(' ')[0].to_i }
x.report { File.read(file_name).scan(/\n/).count }
end
result is
user system total real
0.100000 0.040000 0.140000 ( 0.143636)
0.000000 0.000000 0.090000 ( 0.093293)
0.380000 0.060000 0.440000 ( 0.464925)
The wc -l code has one problem.
If there is only one line in the file and the last character does not end with \n, then count is zero.
So, I recommend calling wc when you count more then one line.