SQL return consecutive records

ⅰ亾dé卋堺 提交于 2019-12-11 03:37:34

问题


A simple table:

ForumPost
--------------
ID (int PK)
UserID (int FK)
Date (datetime)

What I'm looking to return how many times a particular user has made at least 1 post a day for n consecutive days.

Example:

User 15844 has posted at least 1 post a day for 30 consecutive days 10 times

I've tagged this question with linq/lambda as well as a solution there would also be great. I know I can solve this by iterating all the users records but this is slow.


回答1:


There is a handy trick you can use using ROW_NUMBER() to find consecutive entries, imagine the following set of dates, with their row_number (starting at 0):

Date        RowNumber
20130401    0
20130402    1
20130403    2
20130404    3
20130406    4
20130407    5

For consecutive entries if you subtract the row_number from the value you get the same result. e.g.

Date        RowNumber   date - row_number
20130401    0           20130401
20130402    1           20130401
20130403    2           20130401
20130404    3           20130401
20130406    4           20130402
20130407    5           20130402

You can then group by date - row_number to get the sets of consecutive days (i.e. the first 4 records, and the last 2 records).

To apply this to your example you would use:

WITH Posts AS
(   SELECT  FirstPost = DATEADD(DAY, 1 - ROW_NUMBER() OVER(PARTITION BY UserID ORDER BY [Date]), [Date]),
            UserID,
            Date
    FROM    (   SELECT  DISTINCT UserID, [Date] = CAST(Date AS [Date])
                FROM    ForumPost
            ) fp
), Posts2 AS
(   SELECT  FirstPost, 
            UserID, 
            Days = COUNT(*), 
            LastDate = MAX(Date)
    FROM    Posts
    GROUP BY FirstPost, UserID
)
SELECT  UserID, ConsecutiveDates = MAX(Days)
FROM    Posts2
GROUP BY UserID;

Example on SQL Fiddle (simple with just most consecutive days per user)

Further example to show how to get all consecutive periods

EDIT

I don't think the above quite answered the question, this will give the number of times a user has posted on, or over n consecutive days:

WITH Posts AS
(   SELECT  FirstPost = DATEADD(DAY, 1 - ROW_NUMBER() OVER(PARTITION BY UserID ORDER BY [Date]), [Date]),
            UserID,
            Date
    FROM    (   SELECT  DISTINCT UserID, [Date] = CAST(Date AS [Date])
                FROM    ForumPost
            ) fp
), Posts2 AS
(   SELECT  FirstPost, 
            UserID, 
            Days = COUNT(*), 
            FirstDate = MIN(Date), 
            LastDate = MAX(Date)
    FROM    Posts
    GROUP BY FirstPost, UserID
)
SELECT  UserID, [Times Over N Days] = COUNT(*)
FROM    Posts2
WHERE   Days >= 30
GROUP BY UserID;

Example on SQL Fiddle




回答2:


Your particular application makes this pretty simple, I think. If you have 'n' distinct dates in an 'n'-day interval, those 'n' distinct dates must be consecutive.

Scroll to the bottom for a general solution that requires only common table expressions and changing to PostgreSQL. (Kidding. I implemented in PostgreSQL, because I'm short of time.)

create table ForumPost (
  ID integer primary key,
  UserID integer not null,
  post_date date not null
);

insert into forumpost values
(1, 1, '2013-01-15'),
(2, 1, '2013-01-16'),
(3, 1, '2013-01-17'),
(4, 1, '2013-01-18'),
(5, 1, '2013-01-19'),
(6, 1, '2013-01-20'),
(7, 1, '2013-01-21'),

(11, 2, '2013-01-15'),
(12, 2, '2013-01-16'),
(13, 2, '2013-01-17'),
(16, 2, '2013-01-17'),
(14, 2, '2013-01-18'),
(15, 2, '2013-01-19'),

(21, 3, '2013-01-17'),
(22, 3, '2013-01-17'),
(23, 3, '2013-01-17'),
(24, 3, '2013-01-17'),
(25, 3, '2013-01-17'),
(26, 3, '2013-01-17'),
(27, 3, '2013-01-17');

Now, let's look at the output of this query. For brevity, I'm looking at 5-day intervals, not 30-day intervals.

select userid, count(distinct post_date) distinct_dates
from forumpost
where post_date between '2013-01-15' and '2013-01-19'
group by userid;

USERID  DISTINCT_DATES  
1       5
2       5
3       1

For users that fit the criteria, the number of distinct dates in that 5-day interval will have to be 5, right? So we just need to add that logic to a HAVING clause.

select userid, count(distinct post_date) distinct_dates
from forumpost
where post_date between '2013-01-15' and '2013-01-19'
group by userid
having count(distinct post_date) = 5;

USERID  DISTINCT_DATES  
1       5
2       5

A more general solution

It doesn't really make sense to say that, if you post every day from 2013-01-01 to 2013-01-31, you've posted 30 consecutive days 2 times. Instead, I'd expect the clock to start over on 2013-01-31. My apologies for implementing in PostgreSQL; I'll try to implement in T-SQL later.

with first_posts as (
  select userid, min(post_date) first_post_date
  from forumpost
  group by userid
), 
period_intervals as (
  select userid, first_post_date period_start, 
         (first_post_date + interval '4' day)::date period_end
  from first_posts
), user_specific_intervals as (
  select 
    userid, 
    (period_start + (n || ' days')::interval)::date as period_start, 
    (period_end + (n || ' days')::interval)::date as period_end 
  from period_intervals, generate_series(0, 30, 5) n
)
select userid, period_start, period_end, 
       (select count(distinct post_date) 
        from forumpost
        where forumpost.post_date between period_start and period_end
          and userid = forumpost.userid) distinct_dates
from user_specific_intervals
order by userid, period_start;


来源:https://stackoverflow.com/questions/16014969/sql-return-consecutive-records

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!