问题
CSV parsing of the file was very slow so I was trying to load the file directly in to some temp table in database directly and then doing the computation as below :
Earlier it was like this, took 13 mins to add the entries using below method :
CSV.foreach(fileName) do |line|
completePath = line[0]
num_of_bps = line[1]
completePath = cluster_path+ '/' + completePath
inode = FileOrFolder.find_by_fullpath(completePath, :select=>"id")
metric_instance = MetricInstance.find(:first, :conditions=>["file_or_folder_id = ? AND dataset_id = ?", inode.id, dataset_id])
add_entry(metric_instance.id, num_of_bps, num_of_bp_tests)
end
def self.add_entry(metaid, num_of_bps, num_of_bp_tests)
entry = Bp.new
entry.metric_instance_id = metaid
entry.num_of_bps = num_of_bps
entry.num_of_bp_tests = num_of_bp_tests
entry.save
return entry
end
now I changed the method to this, now takes 52 mins :(
@bps = TempTable.all
@bps.each do |bp|
completePath = bp.first_column
num_of_bps = bp.second_column
num_of_bps3 = bp.third_column
completePath = cluster_path+ '/' + completePath
inode = FileOrFolder.find_by_fullpath(completePath, :select=>"id")
num_of_bp_tests = 0
if(inode.nil?)
else
if(num_of_bps !='0')
num_of_bp_tests = 1
end
metric_instance = MetricInstance.find(:first, :conditions=>["file_or_folder_id = ? AND dataset_id = ?", inode.id, dataset_id])
add_entry(metric_instance.id, num_of_bps, num_of_bp_tests)
end
end
Please help me optimize this code or let me know if you think CSV.each is faster than database read !
回答1:
When you load csv into database you do:
- load N csv lines
- insert N records int DB
- select and instantiate N active record models
- iterate over its
When you work with raw csv you only
- load N csv lines
- iterate over its
Of course it's faster.
来源:https://stackoverflow.com/questions/17243333/csv-read-v-s-temp-table-read-from-database-optimization-of-the-loop-and-active