I\'m trying to run a query of about 50,000 records using ActiveRecord\'s find_each
method, but it seems to be ignoring my other parameters like so:
Using Kaminari or something other it will be easy.
module BatchLoader
extend ActiveSupport::Concern
def batch_by_page(options = {})
options = init_batch_options!(options)
next_page = 1
loop do
next_page = yield(next_page, options[:batch_size])
break next_page if next_page.nil?
end
end
private
def default_batch_options
{
batch_size: 50
}
end
def init_batch_options!(options)
options ||= {}
default_batch_options.merge!(options)
end
end
class ThingRepository
include BatchLoader
# @param [Integer] per_page
# @param [Proc] block
def batch_changes(per_page=100, &block)
relation = Thing.active.order("created_at DESC")
batch_by_page do |next_page|
query = relation.page(next_page).per(per_page)
yield query if block_given?
query.next_page
end
end
end
repo = ThingRepository.new
repo.batch_changes(5000).each do |g|
g.each do |t|
#...
end
end
You can try ar-as-batches Gem.
From their documentation you can do something like this
Users.where(country_id: 44).order(:joined_at).offset(200).as_batches do |user|
user.party_all_night!
end
Retrieving the ids
first and processing the in_groups_of
ordered_photo_ids = Photo.order(likes_count: :desc).pluck(:id)
ordered_photo_ids.in_groups_of(1000, false).each do |photo_ids|
photos = Photo.order(likes_count: :desc).where(id: photo_ids)
# ...
end
It's important to also add the ORDER BY
query to the inner call.
Rails 6.1 adds support for descending order in find_each
, find_in_batches
and in_batches
.
find_each uses find_in_batches under the hood.
Its not possible to select the order of the records, as described in find_in_batches, is automatically set to ascending on the primary key (“id ASC”) to make the batch ordering work.
However, the criteria is applied, what you can do is:
Thing.active.find_each(batch_size: 50000) { |t| puts t.id }
Regarding the limit, it wasn't implemented yet: https://github.com/rails/rails/pull/5696
Answering to your second question, you can create the logic yourself:
total_records = 50000
batch = 1000
(0..(total_records - batch)).step(batch) do |i|
puts Thing.active.order("created_at DESC").offset(i).limit(batch).to_sql
end
I was looking for the same behaviour and thought up of this solution. This DOES NOT order by created_at but I thought I would post anyways.
max_records_to_retrieve = 50000
last_index = Thing.count
start_index = [(last_index - max_records_to_retrieve), 0].max
Thing.active.find_each(:start => start_index) do |u|
# do stuff
end
Drawbacks of this approach: - You need 2 queries (first one should be fast) - This guarantees a max of 50K records but if ids are skipped you will get less.