I want to calculate the total timeOnSite for all visitors to a website (and divide it by 3600 because it\'s stored as seconds in the raw data), and then I want to break it down
I couldn't fully test this one but it seems to be working against my dataset:
SELECT
DATE,
COUNT(DISTINCT CONCAT(fv, CAST(v AS STRING))) sessions,
AVG(tos) avg_time_on_site,
content_group,
content_level
FROM(
SELECT
date AS date,
fullvisitorid fv,
visitid v,
ARRAY(SELECT DISTINCT contentGroup.contentGroup1 FROM UNNEST(hits)) AS content_group,
ARRAY(SELECT DISTINCT value FROM UNNEST(hits) AS hits, UNNEST(hits.customDimensions) AS custd WHERE index = 51) AS content_level,
totals.timeOnSite / 3600 AS tos
FROM `dataset_id.ga_sessions_20170101`
WHERE totals.timeOnSite IS NOT NULL
)
CROSS JOIN UNNEST(content_group) content_group
LEFT JOIN UNNEST(content_level) content_level
GROUP BY
DATE, content_group, content_level
What I tried to do is first to avoid the UNNEST(hits) operation on the entire dataset. Therefore, in the very first SELECT statement, content_group and content_level are stored as ARRAYs.
In the next SELECT, I unnested both of those ARRAYs and counted for the total sessions and the average time on site while grouping for the desired fields (I used the average here as it seems to make more sense when dealing with time on site but if you need the summation you can just change the AVG to SUM).
You won't have the problem of repeated timeOnSite in this query because the outer UNNEST(hits) was avoided. When the UNNEST(content_group) and UNNEST(content_level) happens, each value inside those ARRAYs gets associated only once to its correspondent time_on_site so no duplication is happening.