Select revenue per landing page for nested table using Google Big Query

孤人 提交于 2019-12-25 09:15:04

问题


I'm trying to pick up Google Big Query and figure our how I can replicate some standard reporting for the London Cycle Helmet GA sample data. A simple example I've stumbled up on is selecting sum of revenue split by landing page.

Nested tables are new to me and I'm struggling to find any examples that do this or similar using standard SQL.

How can this be done using standard SQL? Or can anyone point me towards any similar examples?

Update

Apologies for not providing more details upfront. I've made some progress enabling me to post some code. I've understood the data structure a little better and attempting to un-nest like so:

#StandardSQL
SELECT Visit_ID, h.page.pagePath AS LandingPage, Sales, Revenue
FROM (
  SELECT
    visitID AS Visit_ID,
    h.hitNumber,
    h.page.pagePath
  FROM
    `project_id.dataset.table`, UNNEST(hits) as h
)  AS landingpages
JOIN (
  SELECT
      fullVisitorId AS Visit_ID, sum(totals.transactions) AS Sales, (sum(totals.transactionRevenue)/1000000) AS Revenue
    FROM
      `project_id.dataset.table`
    WHERE
      totals.visits>0
      AND totals.transactions>=1
      AND totals.transactionRevenue IS NOT NULL
    GROUP BY
      fullVisitorId
) AS sales
ON landingpages.Visit_ID = sales.Visit_ID

This throws the error:

No matching signature for operator = for argument types: INT64, STRING. Supported signature: ANY = ANY at [23:4]

I think this is nearly there, but I don't understand what it's trying to tell me. How can I fix this join?


回答1:


No matching signature for operator = for argument types: INT64, STRING. Supported signature: ANY = ANY at [23:4]
I don't understand what it's trying to tell me.

You are trying to join on equality of two totally different fields.
Not only they are different by values - they even different by type

Field Name      Data Type   Description
fullVisitorId   STRING      The unique visitor ID (also known as client ID).
visitId         INTEGER     An identifier for this session. This is part of the value usually stored as the _utmb cookie. This is only unique to the user. For a completely unique ID, you should use a combination of fullVisitorId and visitId.  

How can I fix this join?

Try below (I am not GA person [added respective tag to the post], but at least it should help in going to something next - I tried to preserve/reuse your original code as much as possible)

#StandardSQL
WITH landingpages AS (
  SELECT
    fullVisitorId,
    visitID,
    h.page.pagePath AS LandingPage
  FROM
    `project_id.dataset.table`, UNNEST(hits) AS h
  WHERE hitNumber = 1
), 
sales AS (
   SELECT
      fullVisitorId, visitID, SUM(totals.transactions) AS Transactions , (SUM(totals.transactionRevenue)/1000000) AS Revenue
    FROM
      `project_id.dataset.table`
    WHERE
      totals.visits > 0
      AND totals.transactions >= 1
      AND totals.transactionRevenue IS NOT NULL
    GROUP BY fullVisitorId, visitID
)
SELECT 
  LandingPage, 
  SUM(Transactions) AS Transactions, 
  SUM(Revenue) AS Revenue
FROM landingpages 
JOIN sales
ON landingpages.VisitID = sales.VisitID 
AND landingpages.fullVisitorId = sales.fullVisitorId
GROUP BY LandingPage



回答2:


Have you looked at this?

You probably want to:

  • get the visitIds and their respective revenue values (if you're looking for transaction revenue totals.totalTransactionRevenue )
  • Get, for each visit the hit that has hits.isEntrance == TRUE
  • And its given page (hits.page.pagePath)

Keep in mind that revenue in GA is stored multiplied by 10^6, so 2300000 == 2.3 (USD or whatever currency you're looking at)

That should be enough to get what you want. The cookbook is also a great place to learn these things, as nested fields are quite tricky sometimes:



来源:https://stackoverflow.com/questions/39894328/select-revenue-per-landing-page-for-nested-table-using-google-big-query

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!