CSV read v/s Temp table read from database, optimization of the loop and active record usage . Ruby

倖福魔咒の 提交于 2019-12-12 05:36:07

问题


CSV parsing of the file was very slow so I was trying to load the file directly in to some temp table in database directly and then doing the computation as below :

Earlier it was like this, took 13 mins to add the entries using below method :

CSV.foreach(fileName) do |line|
     completePath = line[0]                                                
    num_of_bps = line[1]

    completePath = cluster_path+ '/' + completePath
    inode = FileOrFolder.find_by_fullpath(completePath, :select=>"id") 

    metric_instance = MetricInstance.find(:first, :conditions=>["file_or_folder_id = ? AND dataset_id = ?", inode.id, dataset_id])
    add_entry(metric_instance.id, num_of_bps, num_of_bp_tests) 
end



def self.add_entry(metaid, num_of_bps, num_of_bp_tests)
    entry = Bp.new
    entry.metric_instance_id = metaid
    entry.num_of_bps = num_of_bps
    entry.num_of_bp_tests = num_of_bp_tests
    entry.save
    return entry
end

now I changed the method to this, now takes 52 mins :(

@bps = TempTable.all

      @bps.each do |bp|
      completePath = bp.first_column
      num_of_bps = bp.second_column
      num_of_bps3 = bp.third_column


completePath = cluster_path+ '/' + completePath
      inode = FileOrFolder.find_by_fullpath(completePath, :select=>"id")     
      num_of_bp_tests = 0
       if(inode.nil?)
       else
          if(num_of_bps !='0')
            num_of_bp_tests = 1
          end

          metric_instance = MetricInstance.find(:first, :conditions=>["file_or_folder_id = ? AND dataset_id = ?", inode.id, dataset_id])
          add_entry(metric_instance.id, num_of_bps, num_of_bp_tests)
         end
end 

Please help me optimize this code or let me know if you think CSV.each is faster than database read !


回答1:


When you load csv into database you do:

  • load N csv lines
  • insert N records int DB
  • select and instantiate N active record models
  • iterate over its

When you work with raw csv you only

  • load N csv lines
  • iterate over its

Of course it's faster.



来源:https://stackoverflow.com/questions/17243333/csv-read-v-s-temp-table-read-from-database-optimization-of-the-loop-and-active

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!