Elegant PostgreSQL Group by for Ruby on Rails / ActiveRecord

前端 未结 3 871
南旧
南旧 2020-12-31 11:49

Trying to retrieve an array of ActiveRecord Objects grouped by date with PostgreSQL.

More specifically I\'m trying to translate the following MySQL query:

         


        
相关标签:
3条回答
  • 2020-12-31 12:04

    While SQL is quite simple when it comes to answering questions like "when was the most recent post for each day?" it is NOT very straight forward when you ask "which was the most recent post for each day?"

    You can't retrieve the latest Post for each day without using a sub SELECT (or multiple SQL statements). This might work for you (use Post.find_by_sql or similar):

    SELECT P.*, M.just_day, M.max_created_at
    FROM posts P
    JOIN (
      SELECT date(P2.date) AS just_day, MAX(P2.created_at) AS max_created_at
      FROM posts P2
      P.location_id='12345' AND P.published=true
      GROUP BY date(P2.date)
    ) AS M  
       ON AND M.max_created_at = P.created_at
    WHERE P.location_id='12345' AND P.published=true
    

    The above SQL statement should be enough if you can be certain that two posts will not have the same value in the created_at column. If you can't guarantee uniqueness in the created at column, then you either need to filter out the duplicates in Ruby (that shouldn't be too inefficient because presumably you will be looping over the list anyway) or you will need to do N+1 SQL statements. (Actually you could do per-row selects, but AFAIK that is just as inefficient as N+1 SQL statements.)

    Here is how you could remove duplicates while looping:

    last_post = nil
    posts.each do |post|
      unless post.just_day == last_past.try(:just_day)
        # Do stuff
        last_post = post
      end
    end
    

    That said, you could write it nicely with just Ruby/ActiveRecord, if you have few enough days that a SELECT for each day isn't too bad:

    days = Post.group("date(date)")
    posts = days.each { |day| Post.order('created DESC').where("date(day) = ?", day) }
    

    If you are using pagination (say 10 items per page), then this will require 11 SQL statements for each page. Not ideas, but the simplicity might be worth the inefficiency.

    Honestly, if you expect this query to be both run frequently and with a reasonably large data set, then I suggest you add a boolean column called most_recent. The last post from past days won't change. You only need to worry about the posts from today. Just set up a cron job to run a few minutes after the end of the day to update the value for the last day. If you want something more up-to-date, you could have the cron job run every 5 minutes. Or if you need real-time, then add an after_save callback to set most_recent to false for all today's posts that aren't the current post.

    This question is similar: MySQL: Getting highest score for a user

    0 讨论(0)
  • 2020-12-31 12:14

    The PostgreSQL feature you want to use here is DISTINCT ON. There are two basic ways to go about making this query via ActiveRecord.

    The first method is to just specify the :select and :order options. This works great when you have a fairly simple query with no :joins or :include.

    Post.all(
      :select => 'DISTINCT ON (date::date) *',
      :order => 'date::date DESC, created_at DESC'
    )
    

    If you have a more complex query where ActiveRecord generates its own SELECT clause, you can use a subquery to select the target records.

    Post.all(
      :joins => 'INNER JOIN (SELECT DISTINCT ON (date::date) id FROM posts ORDER BY date::date DESC, created_at DESC) x ON x.id = posts.id'
    )
    

    Note that this could be a fair bit slower than the first method depending on your data. I would only use this method if required. Be sure to benchmark with production-like data.

    0 讨论(0)
  • 2020-12-31 12:25

    My solution:

    def self.columns_list
       column_names.collect { |c| "#{table_name}.#{c}" }.join(",")
     end
    
     scope :selling, joins(:products).group(columns_list)
    

    Simple and repeatable.

    0 讨论(0)
提交回复
热议问题