I\'m trying to display the source property name within the Google Analytics roll up property I have linked to bigquery. Issue is, is that when I try the below some of the me
You're flattening your table in your most outer FROM statement (ie here:
FROM
project.dataset.ga_sessions_*
, UNNEST(hits) as h)
All your session level dimensions eg device.* or totals.* values like totals.transactions are already rolled up to a session level, so when you flatten the table by unnesting the hits, these totals values get written as many times as there are hits. Example: Let's say there are 30 hits in one session and 2 transactions, since you flatten/unnest your hits, you will be left with 30 rows containing totals.transactions = 2, so when you sum over them, the result would be that there were 60 transactions in this session. Your users and visits don't get affected by this since you count distinct them, so any dupes get eliminated.
It looks to me as if you're query will work if you just remove the , UNNEST(hits) as h if you remove or adapt this line
h.sourcePropertyInfo.sourcePropertyDisplayName as display
since you already unnest the hits within the select statement where required apart from in this particular line.
As you need to compute several values on the hits level maybe unnesting the field hits is the best approach. The downside is that you lose the totals field aggregation for the session level but still you can work it around.
As an example:
SELECT
date,
CASE
WHEN REGEXP_CONTAINS(h.sourcePropertyInfo.sourcePropertyTrackingId, r'82272640') THEN 'MUG'
WHEN h.sourcePropertyInfo.sourcePropertyTrackingId = 'Social' THEN 'Social'ELSE 'Website'
END AS Property,
geoNetwork.country AS Country,
COUNT(DISTINCT CONCAT(CAST(visitId AS STRING),fullVisitorId)) AS visits,
COUNT(DISTINCT(fullVisitorId)) AS Users,
h.sourcePropertyInfo.sourcePropertyDisplayName AS display,
SUM(CASE
WHEN REGEXP_CONTAINS(h.page.pagepath, r'/') THEN h.latencyTracking.pageLoadTime END) / SUM(CASE
WHEN REGEXP_CONTAINS(h.page.pagepath, r'/') THEN h.latencyTracking.pageLoadSample END) AS pageloadspeed,
COUNT(DISTINCT
CASE
WHEN totals.newVisits = 1 THEN CONCAT(CAST(visitId AS STRING),fullVisitorId) END) new_visits,
COUNT(CASE
WHEN h.type = 'PAGE' THEN h.page.pagepath END) pageviews,
SUM(CASE
WHEN (h.isentrance = TRUE AND h.isexit = TRUE) THEN 1 END) bounces,
COUNT(DISTINCT (CASE
WHEN device.isMobile = TRUE THEN CONCAT(CAST(visitId AS STRING),fullVisitorId) END)) mobilevisits,
COUNT(DISTINCT (CASE
WHEN trafficSource.medium = 'organic' THEN CONCAT(CAST(visitId AS STRING),fullVisitorId) END)) organicvisits,
SUM(CASE
WHEN REGEXP_CONTAINS(h.eventInfo.eventAction,'register$|registersuccess|new registration|account signup|registro') THEN 1 END) AS NewRegistrations,
SUM(CASE
WHEN REGEXP_CONTAINS(h.eventInfo.eventAction, 'add to cart|add to bag|click to buy|ass to basket|comprar|addtobasket::') THEN 1 END) AS ClickToBuy,
COUNT(h.transaction.transactionid) transactions
FROM
`project_id.dataset_id.ga_sessions_*`,
UNNEST(hits) AS h
WHERE
1 = 1
AND PARSE_TIMESTAMP('%Y%m%d', REGEXP_EXTRACT(_table_suffix, r'.*_(.*)')) BETWEEN TIMESTAMP('2017-05-01') AND TIMESTAMP('2017-05-01')
GROUP BY
date,
Country,
display,
Property
I ran it against our dataset and it seems to be working. Some changes I did:
MAX
operation for the Property and added it to the group by.after UNNESTing an array, use MAX() instead SUMMing them up to report on session level matrices:
use:
- MAX(totals.screenviews) AS PAGEVIEWS,
- MAX(totals.bounces) AS BOUNCES,
- MAX(totals.transactions) AS Transactions
- ...
- ...
instead of:
- SUM(totals.screenviews) AS PAGEVIEWS,
- SUM(totals.bounces) AS BOUNCES,
- SUM(totals.transactions) AS Transactions
this should partially solve your issue. Let me know how it goes?
I think the reason why the calculated bounce rate is much higher in the query from William Fuks is the following
WHEN (h.isentrance = TRUE AND h.isexit = TRUE) THEN 1 END) bounces
It seems that isEntrance and isExit only happens on PAGE hits and so does not account for events. Hence the over count of bounces is due to single page views that might have had one or more interaction events happen on the page.