SQL to determine minimum sequential days of access?

前端 未结 19 1693
我在风中等你
我在风中等你 2020-12-04 04:58

The following User History table contains one record for every day a given user has accessed a website (in a 24 hour UTC period). It has many thousands of r

相关标签:
19条回答
  • 2020-12-04 05:22
    declare @startdate as datetime, @days as int
    set @startdate = cast('11 Jan 2009' as datetime) -- The startdate
    set @days = 5 -- The number of consecutive days
    
    SELECT userid
          ,count(1) as [Number of Consecutive Days]
    FROM UserHistory
    WHERE creationdate >= @startdate
    AND creationdate < dateadd(dd, @days, cast(convert(char(11), @startdate, 113)  as datetime))
    GROUP BY userid
    HAVING count(1) >= @days
    

    The statement cast(convert(char(11), @startdate, 113) as datetime) removes the time part of the date so we start at midnight.

    I would assume also that the creationdate and userid columns are indexed.

    I just realized that this won't tell you all the users and their total consecutive days. But will tell you which users will have been visiting a set number of days from a date of your choosing.

    Revised solution:

    declare @days as int
    set @days = 30
    select t1.userid
    from UserHistory t1
    where (select count(1) 
           from UserHistory t3 
           where t3.userid = t1.userid
           and t3.creationdate >= DATEADD(dd, DATEDIFF(dd, 0, t1.creationdate), 0) 
           and t3.creationdate < DATEADD(dd, DATEDIFF(dd, 0, t1.creationdate) + @days, 0) 
           group by t3.userid
    ) >= @days
    group by t1.userid
    

    I've checked this and it will query for all users and all dates. It is based on Spencer's 1st (joke?) solution, but mine works.

    Update: improved the date handling in the second solution.

    0 讨论(0)
  • 2020-12-04 05:23

    Doing this with a single SQL query seems overly complicated to me. Let me break this answer down in two parts.

    1. What you should have done until now and should start doing now:
      Run a daily cron job that checks for every user wether he has logged in today and then increments a counter if he has or sets it to 0 if he hasn't.
    2. What you should do now:
      - Export this table to a server that doesn't run your website and won't be needed for a while. ;)
      - Sort it by user, then date.
      - go through it sequentially, keep a counter...
    0 讨论(0)
  • 2020-12-04 05:24

    A couple of SQL Server 2012 options (assuming N=100 below).

    ;WITH T(UserID, NRowsPrevious)
         AS (SELECT UserID,
                    DATEDIFF(DAY, 
                            LAG(CreationDate, 100) 
                                OVER 
                                    (PARTITION BY UserID 
                                         ORDER BY CreationDate), 
                             CreationDate)
             FROM   UserHistory)
    SELECT DISTINCT UserID
    FROM   T
    WHERE  NRowsPrevious = 100 
    

    Though with my sample data the following worked out more efficient

    ;WITH U
             AS (SELECT DISTINCT UserId
                 FROM   UserHistory) /*Ideally replace with Users table*/
        SELECT UserId
        FROM   U
               CROSS APPLY (SELECT TOP 1 *
                            FROM   (SELECT 
                                           DATEDIFF(DAY, 
                                                    LAG(CreationDate, 100) 
                                                      OVER 
                                                       (ORDER BY CreationDate), 
                                                     CreationDate)
                                    FROM   UserHistory UH
                                    WHERE  U.UserId = UH.UserID) T(NRowsPrevious)
                            WHERE  NRowsPrevious = 100) O
    

    Both rely on the constraint stated in the question that there is at most one record per day per user.

    0 讨论(0)
  • 2020-12-04 05:26

    How about one using Tally tables? It follows a more algorithmic approach, and execution plan is a breeze. Populate the tallyTable with numbers from 1 to 'MaxDaysBehind' that you want to scan the table (ie. 90 will look for 3 months behind,etc).

    declare @ContinousDays int
    set @ContinousDays = 30  -- select those that have 30 consecutive days
    
    create table #tallyTable (Tally int)
    insert into #tallyTable values (1)
    ...
    insert into #tallyTable values (90) -- insert numbers for as many days behind as you want to scan
    
    select [UserId],count(*),t.Tally from HistoryTable 
    join #tallyTable as t on t.Tally>0
    where [CreationDate]> getdate()-@ContinousDays-t.Tally and 
          [CreationDate]<getdate()-t.Tally 
    group by [UserId],t.Tally 
    having count(*)>=@ContinousDays
    
    delete #tallyTable
    
    0 讨论(0)
  • 2020-12-04 05:27

    If this is so important to you, source this event and drive a table to give you this info. No need to kill the machine with all those crazy queries.

    0 讨论(0)
  • 2020-12-04 05:28

    You could use a recursive CTE (SQL Server 2005+):

    WITH recur_date AS (
            SELECT t.userid,
                   t.creationDate,
                   DATEADD(day, 1, t.created) 'nextDay',
                   1 'level' 
              FROM TABLE t
             UNION ALL
            SELECT t.userid,
                   t.creationDate,
                   DATEADD(day, 1, t.created) 'nextDay',
                   rd.level + 1 'level'
              FROM TABLE t
              JOIN recur_date rd on t.creationDate = rd.nextDay AND t.userid = rd.userid)
       SELECT t.*
        FROM recur_date t
       WHERE t.level = @numDays
    ORDER BY t.userid
    
    0 讨论(0)
提交回复
热议问题