create id column based on activity data

一世执手 提交于 2019-12-10 19:06:46

问题


I have a table EVENTS

USER  EVENT_TS             EVENT_TYPE
abc   2016-01-01 08:00:00  Login
abc   2016-01-01 08:25:00  Stuff
abc   2016-01-01 10:00:00  Stuff
abc   2016-01-01 14:00:00  Login
xyz   2015-12-31 18:00:00  Login
xyz   2016-01-01 08:00:00  Logout

What I need to do is produce a session field for each period of activity for each user. In addition, if the user has been idle for a period equal to or longer than p_timeout (1 hour in this case) then a new session starts at the next activity. Users don't always log out cleanly, so the logout isn't walways there...

Notes:

Logout always terminates a session
There doesn't have to be a logout or a login (because software)
Login is always a new session

Output like

USER  EVENT_TS             EVENT_TYPE  SESSION
abc   2016-01-01 08:00:00  Login       1
abc   2016-01-01 08:25:00  Stuff       1
abc   2016-01-01 10:00:00  Stuff       2
abc   2016-01-01 14:00:00  Login       3
xyz   2015-12-31 18:00:00  Login       1
xyz   2016-01-01 08:00:00  Logout      1

Any thoughts on how to acheive this?


回答1:


I think this may do what you need. I changed "user" to "usr" in the input, and "session" to "sess" in the output - I don't ever use reserved Oracle words for object names.

Note: as Boneist pointed out below, my solution will assign a session number of 0 to the first session, if it is a Logout event (or a succession of Logouts right at the top). If this situation can occur in the data, and if the desired behavior is to start session counts at 1 even in that case, then the definition of flag must be tweaked - for example, by making flag = 1 when lag(event_ts) over (partition by usr order by event_ts) is null as well.

Good luck!

with
     events ( usr, event_ts, event_type ) as (
       select 'abc', to_timestamp('2016-01-01 08:00:00', 'yyyy-mm-dd hh24:mi:ss'), 'Login' from dual union all
       select 'abc', to_timestamp('2016-01-01 08:25:00', 'yyyy-mm-dd hh24:mi:ss'), 'Stuff' from dual union all
       select 'abc', to_timestamp('2016-01-01 10:00:00', 'yyyy-mm-dd hh24:mi:ss'), 'Stuff' from dual union all
       select 'abc', to_timestamp('2016-01-01 14:00:00', 'yyyy-mm-dd hh24:mi:ss'), 'Login' from dual union all
       select 'xyz', to_timestamp('2015-12-31 18:00:00', 'yyyy-mm-dd hh24:mi:ss'), 'Login' from dual union all
       select 'xyz', to_timestamp('2016-01-01 08:00:00', 'yyyy-mm-dd hh24:mi:ss'), 'Logout' from dual
     ),
     start_of_sess ( usr, event_ts, event_type, flag ) as (
       select usr, event_ts, event_type,
              case when event_type != 'Logout' 
                    and
                        (    event_ts >= lag(event_ts) over (partition by usr 
                                                             order by event_ts) + 1/24 
                          or event_type = 'Login'
                          or lag(event_type) over (partition by usr 
                                                   order by event_ts) = 'Logout'
                        )
                   then 1 end
       from   events
     )
select usr, event_ts, event_type,
       count(flag) over (partition by usr order by event_ts) as sess
from   start_of_sess
; 

Output (timestamps use my current NLS_TIMESTAMP_FORMAT setting):

USR EVENT_TS                          EVENT_TYPE   SESS
--- --------------------------------- ---------- ------
abc 01-JAN-2016 08.00.00.000000000 AM Login           1
abc 01-JAN-2016 08.25.00.000000000 AM Stuff           1
abc 01-JAN-2016 10.00.00.000000000 AM Stuff           2
abc 01-JAN-2016 02.00.00.000000000 PM Login           3
xyz 31-DEC-2015 06.00.00.000000000 PM Login           1
xyz 01-JAN-2016 08.00.00.000000000 AM Logout          1

 6 rows selected     



回答2:


I think this will do the trick:

WITH EVENTS AS (SELECT 'abc' usr, to_date('2016-01-01 08:00:00', 'yyyy-mm-dd hh24:mi:ss') event_ts, 'login' event_type FROM dual UNION ALL
                SELECT 'abc' usr, to_date('2016-01-01 08:25:00', 'yyyy-mm-dd hh24:mi:ss') event_ts, 'Stuff' event_type FROM dual UNION ALL
                SELECT 'abc' usr, to_date('2016-01-01 10:00:00', 'yyyy-mm-dd hh24:mi:ss') event_ts, 'Stuff' event_type FROM dual UNION ALL
                SELECT 'abc' usr, to_date('2016-01-01 14:00:00', 'yyyy-mm-dd hh24:mi:ss') event_ts, 'login' event_type FROM dual UNION ALL
                SELECT 'xyz' usr, to_date('2015-12-31 18:00:00', 'yyyy-mm-dd hh24:mi:ss') event_ts, 'login' event_type FROM dual UNION ALL
                SELECT 'xyz' usr, to_date('2016-01-01 08:00:00', 'yyyy-mm-dd hh24:mi:ss') event_ts, 'Logout' event_type FROM dual UNION ALL
                SELECT 'def' usr, to_date('2016-01-01 08:00:00', 'yyyy-mm-dd hh24:mi:ss') event_ts, 'Logout' event_type FROM dual UNION ALL
                SELECT 'def' usr, to_date('2016-01-01 08:15:00', 'yyyy-mm-dd hh24:mi:ss') event_ts, 'Logout' event_type FROM dual)
SELECT usr,
       event_ts,
       event_type,
       SUM(counter) OVER (PARTITION BY usr ORDER BY event_ts) session_id
FROM   (SELECT usr,
               event_ts,
               event_type,
               CASE WHEN LAG(event_type, 1, 'Logout') OVER (PARTITION BY usr ORDER BY event_ts) = 'Logout' THEN 1
                    WHEN event_type = 'Logout' THEN 0
                    WHEN event_ts - LAG(event_ts) OVER (PARTITION BY usr ORDER BY event_ts) > 1/24 THEN 1
                    WHEN event_type = 'login' THEN 1
                    ELSE 0
               END counter
        FROM   EVENTS);

USR EVENT_TS            EVENT_TYPE SESSION_ID
--- ------------------- ---------- ----------
abc 2016-01-01 08:00:00 login               1
abc 2016-01-01 08:25:00 Stuff               1
abc 2016-01-01 10:00:00 Stuff               2
abc 2016-01-01 14:00:00 login               3
def 2016-01-01 08:00:00 Logout              1
def 2016-01-01 08:15:00 Logout              2
xyz 2015-12-31 18:00:00 login               1
xyz 2016-01-01 08:00:00 Logout              1

This solution relies on the logic-short circuiting that takes place in the CASE expression and the fact that the event_type is not null. It also assumes that multiple logouts in a row are counted as separate sessions:

  1. If the previous row was a logout row (and if there is no previous row - i.e. for the first row in the set - treat it as if a logout row was present), we want to increase the counter by one. (Logouts terminate the session, so we always have a new session following a logout.)
  2. If the current row is a logout, then this terminates the existing session. Therefore, the counter shouldn't be increased.
  3. If the time of the current row is greater than an hour from the previous row, increase the counter by one.
  4. If the current row is a login row, then it's a new session, so increase the counter by one.
  5. For any other case, we don't increase the counter.

Once we've done that, it's just a matter of doing a running total on the counter.




回答3:


For completeness (for users with Oracle 12 or above), here is a solution using MATCH_RECOGNIZE:

select usr, event_ts, event_type, sess
from   events
match_recognize(
  partition by usr
  order by event_ts
  measures match_number() as sess
  all rows per match
  pattern (strt follow*)
  define follow as    event_type = 'Logout'
                   or (     event_type != 'Login'
                        and prev(event_type) != 'Logout'
                        and event_ts < prev(event_ts) + 1/24
                      )
)
;

Here I cover an unusual case: a Logout event following another Logout event. In such cases, I assume all consecutive Logouts, no matter how many and how far apart in time, belong to the same session. (If such cases are guaranteed not to occur in the data, so much the better.)

Please see also the Note I added to my other answer (for Oracle 11 and below) regarding the possibility of the very first event for a usr being a Logout (if that is even possible in the input data).



来源:https://stackoverflow.com/questions/40133743/create-id-column-based-on-activity-data

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!