问题
I am trying to build up a matrix out of a table that is imported from Google Analytics data into BigQuery. The table represents hits on a website that contain session_IDs alongside some properties such as the url, timestamp etc. Also, there are some metadata based on user-defined actions that we refer to as events. Below is an example of the table.
session_id hit_timestamp url event_category
1 11:12:23 url134 event1
1 11:14:23 url2234 event2
1 11:16:23 url_target null
2 03:12:11 url2344 event1
2 03:14:11 url43245 event2
3 09:10:11 url5533 event2
3 09:09:11 url_target null
4 08:08:08 url64356 event2
4 08:09:08 url56456 event2
4 08:10:08 url_target null
The intended result should be something like the below table.
session_id event1 event2 target
1 1 1 1
2 0 0 0
3 0 0 0
4 0 2 1
Note that any event does not lead to url_target should be denoted as zeros including the target. This means the query should look into timestamp to check that any events are followed by url_target by looking into their timestamp. For example, event2 was not followed by "url_target", that is why we are denoting it as zeros. Same case in session_id 3, as event2 was not followed by url_target, note the timestamp of url_target which was before event2, not after it. Hence denoted as zeros.
I would appreciate any help in constructing the SQL query to produce that matrix. I was only able to group by session_id and then perform counting events using "count", but was not able to find the write SQL query to match against timestamp and check other fields.
回答1:
Use a subquery to calculate the first (or last) target time. Then use countif()
and aggregation:
select session_id,
countif(target_hit_timestamp > hit_timestamp and category = 'event1') as event1,
countif(target_hit_timestamp > hit_timestamp and category = 'event2') as event2,
countif(url like '%target') as target
from (select t.*,
min(case when url like '%target' then hit_timestamp end) over (partition by session_id) as target_hit_timestamp
from t
) t
group by session_id
回答2:
Consider:
select session_id,
countif(cnt_url_target > 0 and event_category = 'event1') event1,
countif(cnt_url_target > 0 and event_category = 'event2') event2,
countif(url = 'url_target') target
from (
select t.*,
countif(url = 'url_target') over(partition by session_id order by hit_timestamp desc) cnt_url_target
from mytable t
) t
group by session_id
来源:https://stackoverflow.com/questions/64834149/bigquery-and-google-analytics-sql-query