问题
A simple table:
ForumPost
--------------
ID (int PK)
UserID (int FK)
Date (datetime)
What I'm looking to return how many times a particular user has made at least 1 post a day for n consecutive days.
Example:
User 15844 has posted at least 1 post a day for 30 consecutive days 10 times
I've tagged this question with linq/lambda as well as a solution there would also be great. I know I can solve this by iterating all the users records but this is slow.
回答1:
There is a handy trick you can use using ROW_NUMBER() to find consecutive entries, imagine the following set of dates, with their row_number (starting at 0):
Date RowNumber
20130401 0
20130402 1
20130403 2
20130404 3
20130406 4
20130407 5
For consecutive entries if you subtract the row_number from the value you get the same result. e.g.
Date RowNumber date - row_number
20130401 0 20130401
20130402 1 20130401
20130403 2 20130401
20130404 3 20130401
20130406 4 20130402
20130407 5 20130402
You can then group by date - row_number to get the sets of consecutive days (i.e. the first 4 records, and the last 2 records).
To apply this to your example you would use:
WITH Posts AS
( SELECT FirstPost = DATEADD(DAY, 1 - ROW_NUMBER() OVER(PARTITION BY UserID ORDER BY [Date]), [Date]),
UserID,
Date
FROM ( SELECT DISTINCT UserID, [Date] = CAST(Date AS [Date])
FROM ForumPost
) fp
), Posts2 AS
( SELECT FirstPost,
UserID,
Days = COUNT(*),
LastDate = MAX(Date)
FROM Posts
GROUP BY FirstPost, UserID
)
SELECT UserID, ConsecutiveDates = MAX(Days)
FROM Posts2
GROUP BY UserID;
Example on SQL Fiddle (simple with just most consecutive days per user)
Further example to show how to get all consecutive periods
EDIT
I don't think the above quite answered the question, this will give the number of times a user has posted on, or over n consecutive days:
WITH Posts AS
( SELECT FirstPost = DATEADD(DAY, 1 - ROW_NUMBER() OVER(PARTITION BY UserID ORDER BY [Date]), [Date]),
UserID,
Date
FROM ( SELECT DISTINCT UserID, [Date] = CAST(Date AS [Date])
FROM ForumPost
) fp
), Posts2 AS
( SELECT FirstPost,
UserID,
Days = COUNT(*),
FirstDate = MIN(Date),
LastDate = MAX(Date)
FROM Posts
GROUP BY FirstPost, UserID
)
SELECT UserID, [Times Over N Days] = COUNT(*)
FROM Posts2
WHERE Days >= 30
GROUP BY UserID;
Example on SQL Fiddle
回答2:
Your particular application makes this pretty simple, I think. If you have 'n' distinct dates in an 'n'-day interval, those 'n' distinct dates must be consecutive.
Scroll to the bottom for a general solution that requires only common table expressions and changing to PostgreSQL. (Kidding. I implemented in PostgreSQL, because I'm short of time.)
create table ForumPost (
ID integer primary key,
UserID integer not null,
post_date date not null
);
insert into forumpost values
(1, 1, '2013-01-15'),
(2, 1, '2013-01-16'),
(3, 1, '2013-01-17'),
(4, 1, '2013-01-18'),
(5, 1, '2013-01-19'),
(6, 1, '2013-01-20'),
(7, 1, '2013-01-21'),
(11, 2, '2013-01-15'),
(12, 2, '2013-01-16'),
(13, 2, '2013-01-17'),
(16, 2, '2013-01-17'),
(14, 2, '2013-01-18'),
(15, 2, '2013-01-19'),
(21, 3, '2013-01-17'),
(22, 3, '2013-01-17'),
(23, 3, '2013-01-17'),
(24, 3, '2013-01-17'),
(25, 3, '2013-01-17'),
(26, 3, '2013-01-17'),
(27, 3, '2013-01-17');
Now, let's look at the output of this query. For brevity, I'm looking at 5-day intervals, not 30-day intervals.
select userid, count(distinct post_date) distinct_dates
from forumpost
where post_date between '2013-01-15' and '2013-01-19'
group by userid;
USERID DISTINCT_DATES
1 5
2 5
3 1
For users that fit the criteria, the number of distinct dates in that 5-day interval will have to be 5, right? So we just need to add that logic to a HAVING clause.
select userid, count(distinct post_date) distinct_dates
from forumpost
where post_date between '2013-01-15' and '2013-01-19'
group by userid
having count(distinct post_date) = 5;
USERID DISTINCT_DATES
1 5
2 5
A more general solution
It doesn't really make sense to say that, if you post every day from 2013-01-01 to 2013-01-31, you've posted 30 consecutive days 2 times. Instead, I'd expect the clock to start over on 2013-01-31. My apologies for implementing in PostgreSQL; I'll try to implement in T-SQL later.
with first_posts as (
select userid, min(post_date) first_post_date
from forumpost
group by userid
),
period_intervals as (
select userid, first_post_date period_start,
(first_post_date + interval '4' day)::date period_end
from first_posts
), user_specific_intervals as (
select
userid,
(period_start + (n || ' days')::interval)::date as period_start,
(period_end + (n || ' days')::interval)::date as period_end
from period_intervals, generate_series(0, 30, 5) n
)
select userid, period_start, period_end,
(select count(distinct post_date)
from forumpost
where forumpost.post_date between period_start and period_end
and userid = forumpost.userid) distinct_dates
from user_specific_intervals
order by userid, period_start;
来源:https://stackoverflow.com/questions/16014969/sql-return-consecutive-records