Finding mongoDB records in batches (using mongoid ruby adapter)

后端 未结 6 1948
悲哀的现实
悲哀的现实 2020-12-14 00:32

Using rails 3 and mongoDB with the mongoid adapter, how can I batch finds to the mongo DB? I need to grab all the records in a particular mongo DB collection and index them

相关标签:
6条回答
  • 2020-12-14 00:50

    With Mongoid, you don't need to manually batch the query.

    In Mongoid, Model.all returns a Mongoid::Criteria instance. Upon calling #each on this Criteria, a Mongo driver cursor is instantiated and used to iterate over the records. This underlying Mongo driver cursor already batches all records. By default the batch_size is 100.

    For more information on this topic, read this comment from the Mongoid author and maintainer.

    In summary, you can just do this:

    Model.all.each do |r|
      Sunspot.index(r)
    end
    
    0 讨论(0)
  • 2020-12-14 01:02

    If you are iterating over a collection where each record requires a lot of processing (i.e querying an external API for each item) it is possible for the cursor to timeout. In this case you need to perform multiple queries in order to not leave the cursor open.

    require 'mongoid'
    
    module Mongoid
      class Criteria
        def in_batches_of(count = 100)
          Enumerator.new do |y|
            total = 0
    
            loop do
              batch = 0
    
              self.limit(count).skip(total).each do |item|
                total += 1
                batch += 1
                y << item
              end
    
              break if batch == 0
            end
          end
        end
      end
    end
    

    Here is a helper method you can use to add the batching functionality. It can be used like so:

    Post.all.order_by(:id => 1).in_batches_of(7).each_with_index do |post, index|
      # call external slow API
    end
    

    Just make sure you ALWAYS have an order_by on your query. Otherwise the paging might not do what you want it to. Also I would stick with batches of 100 or less. As said in the accepted answer Mongoid queries in batches of 100 so you never want to leave the cursor open while doing the processing.

    0 讨论(0)
  • 2020-12-14 01:08

    It is faster to send batches to sunspot as well. This is how I do it:

    records = []
    Model.batch_size(1000).no_timeout.only(:your_text_field, :_id).all.each do |r|
      records << r
      if records.size > 1000
        Sunspot.index! records
        records.clear
      end
    end
    Sunspot.index! records
    

    no_timeout: prevents the cursor to disconnect (after 10 min, by default)

    only: selects only the id and the fields, which are actually indexed

    batch_size: fetch 1000 entries instead of 100

    0 讨论(0)
  • 2020-12-14 01:11

    I am not sure about the batch processing, but you can do this way

    current_page = 0
    item_count = Model.count
    while item_count > 0
      Model.all.skip(current_page * 1000).limit(1000).each do |item|
        Sunpot.index(item)
      end
      item_count-=1000
      current_page+=1
    end
    

    But if you are looking for a perfect long time solution i wouldn't recommend this. Let me explain how i handled the same scenario in my app. Instead of doing batch jobs,

    • i have created a resque job which updates the solr index

      class SolrUpdator
       @queue = :solr_updator
      
       def self.perform(item_id)
         item = Model.find(item_id)
         #i have used RSolr, u can change the below code to handle sunspot
         solr = RSolr.connect :url => Rails.application.config.solr_path
         js = JSON.parse(item.to_json)
         solr.add js         
       end
      

      end

    • After adding the item, i just put an entry to the resque queue

      Resque.enqueue(SolrUpdator, item.id.to_s)
      
    • Thats all, start the resque and it will take care of everything
    0 讨论(0)
  • 2020-12-14 01:12

    As @RyanMcGeary said, you don't need to worry about batching the query. However, indexing objects one at a time is much much slower than batching them.

    Model.all.to_a.in_groups_of(1000, false) do |records|
      Sunspot.index! records
    end
    
    0 讨论(0)
  • 2020-12-14 01:13

    The following will work for you , just try it

    Model.all.in_groups_of(1000, false) do |r|
      Sunspot.index! r
    end
    
    0 讨论(0)
提交回复
热议问题