Count number of rows that are not within 10 seconds of each other

后端 未结 8 699
情歌与酒
情歌与酒 2021-02-01 09:26

I track web visitors. I store the IP address as well as the timestamp of the visit.

ip_address    time_stamp
180.2.79.3  1301654105
180.2.79.3  1301654106
180.2.         


        
8条回答
  •  轮回少年
    2021-02-01 10:02

    Let me start with this table. I'll use ordinary timestamps so we can easily see what's going on.

    180.2.79.3   2011-01-01 08:00:00
    180.2.79.3   2011-01-01 08:00:09
    180.2.79.3   2011-01-01 08:00:20
    180.2.79.3   2011-01-01 08:00:23
    180.2.79.3   2011-01-01 08:00:25
    180.2.79.3   2011-01-01 08:00:40
    180.2.79.4   2011-01-01 08:00:00
    180.2.79.4   2011-01-01 08:00:13
    180.2.79.4   2011-01-01 08:00:23
    180.2.79.4   2011-01-01 08:00:25
    180.2.79.4   2011-01-01 08:00:27
    180.2.79.4   2011-01-01 08:00:29
    180.2.79.4   2011-01-01 08:00:50
    

    If I understand you correctly, you want to count these like this.

    180.2.79.3   3
    180.2.79.4   3
    

    You can do that for each ip_address by selecting the maximum timestamp that is both

    • greater than the current row's timestamp, and
    • less than or equal to 10 seconds greater than the current row's timestamp.

    Taking these two criteria together will introduce some nulls, which turn out to be really useful.

    select ip_address, 
           t_s.time_stamp, 
           (select max(t.time_stamp) 
            from t_s t 
            where t.ip_address = t_s.ip_address 
              and t.time_stamp > t_s.time_stamp
              and t.time_stamp - t_s.time_stamp <= interval '10' second) next_page
    from t_s 
    group by ip_address, t_s.time_stamp
    order by ip_address, t_s.time_stamp;
    
    ip_address   time_stamp            next_page
    180.2.79.3   2011-01-01 08:00:00   2011-01-01 08:00:09
    180.2.79.3   2011-01-01 08:00:09   
    180.2.79.3   2011-01-01 08:00:20   2011-01-01 08:00:25
    180.2.79.3   2011-01-01 08:00:23   2011-01-01 08:00:25
    180.2.79.3   2011-01-01 08:00:25   
    180.2.79.3   2011-01-01 08:00:40   
    180.2.79.4   2011-01-01 08:00:00   
    180.2.79.4   2011-01-01 08:00:13   2011-01-01 08:00:23
    180.2.79.4   2011-01-01 08:00:23   2011-01-01 08:00:29
    180.2.79.4   2011-01-01 08:00:25   2011-01-01 08:00:29
    180.2.79.4   2011-01-01 08:00:27   2011-01-01 08:00:29
    180.2.79.4   2011-01-01 08:00:29   
    180.2.79.4   2011-01-01 08:00:50   
    

    The timestamp that marks the end of a visit has a null for its own next_page. That's because no timestamp is less than or equal to time_stamp + 10 seconds for that row.

    To get a count, I'd probably create a view and count the nulls.

    select ip_address, count(*)
    from t_s_visits 
    where next_page is null
    group by ip_address
    
    180.2.79.3   3
    180.2.79.4   3
    

提交回复
热议问题