Using Impala get the count of consecutive trips

杀马特。学长 韩版系。学妹 提交于 2020-01-14 04:28:09

问题


Sample Data

touristid|day
ABC|1
ABC|1
ABC|2
ABC|4
ABC|5
ABC|6
ABC|8
ABC|10

The output should be

touristid|trip
ABC|4

Logic behind 4 is count of consecutive days distinct consecutive days sqq 1,1,2 is 1st then 4,5,6 is 2nd then 8 is 3rd and 10 is 4th I want this output using impala query


回答1:


Get previous day using lag() function, calculate new_trip_flag if the day-prev_day>1, then count(new_trip_flag).

Demo:

with table1 as (
select 'ABC' as touristid, 1  as day union all
select 'ABC' as touristid, 1  as day union all
select 'ABC' as touristid, 2  as day union all
select 'ABC' as touristid, 4  as day union all
select 'ABC' as touristid, 5  as day union all
select 'ABC' as touristid, 6  as day union all
select 'ABC' as touristid, 8  as day union all
select 'ABC' as touristid, 10 as day 
)

select touristid, count(new_trip_flag) trip_cnt
  from 
       ( -- calculate new_trip_flag
         select touristid,
                case when (day-prev_day) > 1 or prev_day is NULL then true end  new_trip_flag
           from       
                ( -- get prev_day
                  select touristid, day, 
                         lag(day) over(partition by touristid order by day) prev_day
                    from table1
                )s
        )s
 group by touristid;

Result:

touristid   trip_cnt    
ABC         4   

The same will work in Hive also.



来源:https://stackoverflow.com/questions/58240724/using-impala-get-the-count-of-consecutive-trips

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!