BigQuery and Google Analytics SQL query

╄→гoц情女王★ 提交于 2020-12-15 05:19:55

问题


I am trying to build up a matrix out of a table that is imported from Google Analytics data into BigQuery. The table represents hits on a website that contain session_IDs alongside some properties such as the url, timestamp etc. Also, there are some metadata based on user-defined actions that we refer to as events. Below is an example of the table.

session_id  hit_timestamp   url event_category
1           11:12:23        url134      event1
1           11:14:23        url2234     event2
1           11:16:23        url_target  null
2           03:12:11        url2344     event1
2           03:14:11        url43245    event2
3           09:10:11        url5533     event2
3           09:09:11        url_target  null
4           08:08:08        url64356    event2
4           08:09:08        url56456    event2
4           08:10:08        url_target  null

The intended result should be something like the below table.

session_id  event1  event2  target
1           1       1       1
2           0       0       0
3           0       0       0
4           0       2       1

Note that any event does not lead to url_target should be denoted as zeros including the target. This means the query should look into timestamp to check that any events are followed by url_target by looking into their timestamp. For example, event2 was not followed by "url_target", that is why we are denoting it as zeros. Same case in session_id 3, as event2 was not followed by url_target, note the timestamp of url_target which was before event2, not after it. Hence denoted as zeros.

I would appreciate any help in constructing the SQL query to produce that matrix. I was only able to group by session_id and then perform counting events using "count", but was not able to find the write SQL query to match against timestamp and check other fields.


回答1:


Use a subquery to calculate the first (or last) target time. Then use countif() and aggregation:

select session_id,
       countif(target_hit_timestamp > hit_timestamp and category = 'event1') as event1,
       countif(target_hit_timestamp > hit_timestamp and category = 'event2') as event2,
       countif(url like '%target') as target
from (select t.*,
             min(case when url like '%target' then hit_timestamp end) over (partition by session_id) as target_hit_timestamp
      from t
     ) t
group by session_id



回答2:


Consider:

select session_id,
    countif(cnt_url_target > 0 and event_category = 'event1') event1,
    countif(cnt_url_target > 0 and event_category = 'event2') event2,
    countif(url = 'url_target') target
from (
    select t.*,
        countif(url = 'url_target') over(partition by session_id order by hit_timestamp desc) cnt_url_target
    from mytable t
) t
group by session_id


来源:https://stackoverflow.com/questions/64834149/bigquery-and-google-analytics-sql-query

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!