问题
I'm trying to pick up Google Big Query and figure our how I can replicate some standard reporting for the London Cycle Helmet GA sample data. A simple example I've stumbled up on is selecting sum of revenue split by landing page.
Nested tables are new to me and I'm struggling to find any examples that do this or similar using standard SQL.
How can this be done using standard SQL? Or can anyone point me towards any similar examples?
Update
Apologies for not providing more details upfront. I've made some progress enabling me to post some code. I've understood the data structure a little better and attempting to un-nest like so:
#StandardSQL
SELECT Visit_ID, h.page.pagePath AS LandingPage, Sales, Revenue
FROM (
SELECT
visitID AS Visit_ID,
h.hitNumber,
h.page.pagePath
FROM
`project_id.dataset.table`, UNNEST(hits) as h
) AS landingpages
JOIN (
SELECT
fullVisitorId AS Visit_ID, sum(totals.transactions) AS Sales, (sum(totals.transactionRevenue)/1000000) AS Revenue
FROM
`project_id.dataset.table`
WHERE
totals.visits>0
AND totals.transactions>=1
AND totals.transactionRevenue IS NOT NULL
GROUP BY
fullVisitorId
) AS sales
ON landingpages.Visit_ID = sales.Visit_ID
This throws the error:
No matching signature for operator = for argument types: INT64, STRING. Supported signature: ANY = ANY at [23:4]
I think this is nearly there, but I don't understand what it's trying to tell me. How can I fix this join?
回答1:
No matching signature for operator = for argument types: INT64, STRING. Supported signature: ANY = ANY at [23:4]
I don't understand what it's trying to tell me.
You are trying to join on equality of two totally different fields.
Not only they are different by values - they even different by type
Field Name Data Type Description
fullVisitorId STRING The unique visitor ID (also known as client ID).
visitId INTEGER An identifier for this session. This is part of the value usually stored as the _utmb cookie. This is only unique to the user. For a completely unique ID, you should use a combination of fullVisitorId and visitId.
How can I fix this join?
Try below (I am not GA person [added respective tag to the post], but at least it should help in going to something next - I tried to preserve/reuse your original code as much as possible)
#StandardSQL
WITH landingpages AS (
SELECT
fullVisitorId,
visitID,
h.page.pagePath AS LandingPage
FROM
`project_id.dataset.table`, UNNEST(hits) AS h
WHERE hitNumber = 1
),
sales AS (
SELECT
fullVisitorId, visitID, SUM(totals.transactions) AS Transactions , (SUM(totals.transactionRevenue)/1000000) AS Revenue
FROM
`project_id.dataset.table`
WHERE
totals.visits > 0
AND totals.transactions >= 1
AND totals.transactionRevenue IS NOT NULL
GROUP BY fullVisitorId, visitID
)
SELECT
LandingPage,
SUM(Transactions) AS Transactions,
SUM(Revenue) AS Revenue
FROM landingpages
JOIN sales
ON landingpages.VisitID = sales.VisitID
AND landingpages.fullVisitorId = sales.fullVisitorId
GROUP BY LandingPage
回答2:
Have you looked at this?
You probably want to:
- get the visitIds and their respective revenue values (if you're looking for transaction revenue totals.totalTransactionRevenue )
- Get, for each visit the hit that has hits.isEntrance == TRUE
- And its given page (hits.page.pagePath)
Keep in mind that revenue in GA is stored multiplied by 10^6, so 2300000 == 2.3 (USD or whatever currency you're looking at)
That should be enough to get what you want. The cookbook is also a great place to learn these things, as nested fields are quite tricky sometimes:
来源:https://stackoverflow.com/questions/39894328/select-revenue-per-landing-page-for-nested-table-using-google-big-query