The following User History table contains one record for every day a given user has accessed a website (in a 24 hour UTC period). It has many thousands of r
declare @startdate as datetime, @days as int
set @startdate = cast('11 Jan 2009' as datetime) -- The startdate
set @days = 5 -- The number of consecutive days
SELECT userid
,count(1) as [Number of Consecutive Days]
FROM UserHistory
WHERE creationdate >= @startdate
AND creationdate < dateadd(dd, @days, cast(convert(char(11), @startdate, 113) as datetime))
GROUP BY userid
HAVING count(1) >= @days
The statement cast(convert(char(11), @startdate, 113) as datetime)
removes the time part of the date so we start at midnight.
I would assume also that the creationdate
and userid
columns are indexed.
I just realized that this won't tell you all the users and their total consecutive days. But will tell you which users will have been visiting a set number of days from a date of your choosing.
Revised solution:
declare @days as int
set @days = 30
select t1.userid
from UserHistory t1
where (select count(1)
from UserHistory t3
where t3.userid = t1.userid
and t3.creationdate >= DATEADD(dd, DATEDIFF(dd, 0, t1.creationdate), 0)
and t3.creationdate < DATEADD(dd, DATEDIFF(dd, 0, t1.creationdate) + @days, 0)
group by t3.userid
) >= @days
group by t1.userid
I've checked this and it will query for all users and all dates. It is based on Spencer's 1st (joke?) solution, but mine works.
Update: improved the date handling in the second solution.
Doing this with a single SQL query seems overly complicated to me. Let me break this answer down in two parts.
A couple of SQL Server 2012 options (assuming N=100 below).
;WITH T(UserID, NRowsPrevious)
AS (SELECT UserID,
DATEDIFF(DAY,
LAG(CreationDate, 100)
OVER
(PARTITION BY UserID
ORDER BY CreationDate),
CreationDate)
FROM UserHistory)
SELECT DISTINCT UserID
FROM T
WHERE NRowsPrevious = 100
Though with my sample data the following worked out more efficient
;WITH U
AS (SELECT DISTINCT UserId
FROM UserHistory) /*Ideally replace with Users table*/
SELECT UserId
FROM U
CROSS APPLY (SELECT TOP 1 *
FROM (SELECT
DATEDIFF(DAY,
LAG(CreationDate, 100)
OVER
(ORDER BY CreationDate),
CreationDate)
FROM UserHistory UH
WHERE U.UserId = UH.UserID) T(NRowsPrevious)
WHERE NRowsPrevious = 100) O
Both rely on the constraint stated in the question that there is at most one record per day per user.
How about one using Tally tables? It follows a more algorithmic approach, and execution plan is a breeze. Populate the tallyTable with numbers from 1 to 'MaxDaysBehind' that you want to scan the table (ie. 90 will look for 3 months behind,etc).
declare @ContinousDays int
set @ContinousDays = 30 -- select those that have 30 consecutive days
create table #tallyTable (Tally int)
insert into #tallyTable values (1)
...
insert into #tallyTable values (90) -- insert numbers for as many days behind as you want to scan
select [UserId],count(*),t.Tally from HistoryTable
join #tallyTable as t on t.Tally>0
where [CreationDate]> getdate()-@ContinousDays-t.Tally and
[CreationDate]<getdate()-t.Tally
group by [UserId],t.Tally
having count(*)>=@ContinousDays
delete #tallyTable
If this is so important to you, source this event and drive a table to give you this info. No need to kill the machine with all those crazy queries.
You could use a recursive CTE (SQL Server 2005+):
WITH recur_date AS (
SELECT t.userid,
t.creationDate,
DATEADD(day, 1, t.created) 'nextDay',
1 'level'
FROM TABLE t
UNION ALL
SELECT t.userid,
t.creationDate,
DATEADD(day, 1, t.created) 'nextDay',
rd.level + 1 'level'
FROM TABLE t
JOIN recur_date rd on t.creationDate = rd.nextDay AND t.userid = rd.userid)
SELECT t.*
FROM recur_date t
WHERE t.level = @numDays
ORDER BY t.userid