Select data within one month prior to each user's last record

喜夏-厌秋 提交于 2021-02-08 11:25:41

问题


Assume I have a table called "Diary" like this:

| id | user_id |        recorded_at       | record |
|----|---------|--------------------------|--------|
| 20 |  50245  |2017-10-01 23:00:14.765366|   89   |
| 21 |  50245  |2017-12-05 10:00:33.135331|   97   |
| 22 |  50245  |2017-12-31 11:50:23.965134|   80   |
| 23 |  76766  |2015-10-06 11:00:14.902452|   70   |
| 24 |  76766  |2015-10-07 22:40:59.124553|   81   |

For each user I want to retrieve the latest row and all rows within one month prior to that.

In other words, for user_id 50245, I want the his/her data from "2017-12-01 11:50:23.965134" to "2017-12-31 11:50:23.965134"; for user_id 76766, I want his/her data from "2015-09-07 22:40:59.124553" to "2015-10-07 22:40:59.124553".

Hence the desired result looks like this:

| id | user_id |        recorded_at       | record |
|----|---------|--------------------------|--------|
| 21 |  50245  |2017-12-05 10:00:33.135331|   97   |
| 22 |  50245  |2017-12-31 11:50:23.965134|   80   |
| 23 |  76766  |2015-10-06 11:00:14.902452|   70   |
| 24 |  76766  |2015-10-07 22:40:59.124553|   81   |

Please note that the record of id 20 is not included because it is more than one month prior to user_id 50245's last record.

Is there any way I can write an SQL query to achieve this?


回答1:


For small tables, any (valid) query technique is good.

For big tables, details matter. Assuming:

  • There is also a users table with user_id as PK containing all relevant users (or possibly a few more). This is the typical setup.

  • You have (or can create) an index on diary (user_id, recorded_at DESC NULLS LAST). NULLS LAST is optional if recorded_at is defined NOT NULL. But make sure the query matches the index.

  • More than a few rows per user - the typical use case.

This should be among the fastest options:

SELECT d.*
FROM   users u
CROSS  JOIN LATERAL (
   SELECT recorded_at
   FROM   diary
   WHERE  user_id = u.user_id
   ORDER  BY recorded_at DESC NULLS LAST
   LIMIT 1
   ) d1
JOIN   diary d ON d.user_id = u.user_id
              AND d.recorded_at >= d1.recorded_at - interval '1 month'
ORDER  BY d.user_id, d.recorded_at;

Produces your desired result exactly.

For only few rows per user, max() or DISTINCT ON () in a subquery are typically faster.

Related (with detailed explanation):

  • Optimize GROUP BY query to retrieve latest record per user
  • Select first row in each GROUP BY group?
  • What is the difference between LATERAL and a subquery in PostgreSQL?

About the FROM clause:

  • Start with the manual
  • Why does this implicit join get planned differently than an explicit join?
  • What does [FROM x, y] mean in Postgres?



回答2:


I would be inclined to use window functions:

select d.*
from (select d.*, max(d.recorded_at) over (partition by d.user_id) as max_recorded_at
      from diary d
     ) d
where recorded_at >= max_recorded_at - interval '1 month';



回答3:


The straightforward way is to use a subquery to get the max recorded_at for each user_id and then join:

select d.*
  from diary d
       join ( select user_id, max(recorderd_at) mra
                from diary
               group by user_id ) m on d.user_id = m.user_id
 where m.mra <= d.recorded_at + interval '1 month'

this has the drawback of accessing the table twice (may be different in different RDBMS - use explain to see the execution plan).

A better alternative is to use window functions to do everything in one pass:

select id, user_id, recorderd_at
  from ( select *, max(recorderd_at) over (partition by user_id) as mra
           from diary ) x
 where mra <= recorderd_at + interval '1 months'

Disclaimer I did not test the queries above, but you should get the idea anyway - see http://sqlfiddle.com/#!17/e90000/9 for a working example w/ similar schema




回答4:


Not tested but something like this should work.

I would use a subquery to get the last_record then filter out those at the date and the previous month like for example :

select d.* from diary d,
(select max(recorded_at) l from diary group by user_id) as last_record 
where  d.recorded_at = last_record.l
or
  ( 
   d.recorded_at  >= date_trunc('month', last_record.l - interval '1' month)
   and d.recorded_at  < last_record.l
  )


来源:https://stackoverflow.com/questions/48345520/select-data-within-one-month-prior-to-each-users-last-record

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!