Google Analytics Metrics are inflated when extracting hit level data using BigQuery

前端 未结 4 1502
迷失自我
迷失自我 2020-12-22 11:52

I\'m trying to display the source property name within the Google Analytics roll up property I have linked to bigquery. Issue is, is that when I try the below some of the me

相关标签:
4条回答
  • 2020-12-22 12:18

    You're flattening your table in your most outer FROM statement (ie here:

    FROM project.dataset.ga_sessions_*, UNNEST(hits) as h)

    All your session level dimensions eg device.* or totals.* values like totals.transactions are already rolled up to a session level, so when you flatten the table by unnesting the hits, these totals values get written as many times as there are hits. Example: Let's say there are 30 hits in one session and 2 transactions, since you flatten/unnest your hits, you will be left with 30 rows containing totals.transactions = 2, so when you sum over them, the result would be that there were 60 transactions in this session. Your users and visits don't get affected by this since you count distinct them, so any dupes get eliminated.

    It looks to me as if you're query will work if you just remove the , UNNEST(hits) as h if you remove or adapt this line

    h.sourcePropertyInfo.sourcePropertyDisplayName as display

    since you already unnest the hits within the select statement where required apart from in this particular line.

    0 讨论(0)
  • 2020-12-22 12:33

    As you need to compute several values on the hits level maybe unnesting the field hits is the best approach. The downside is that you lose the totals field aggregation for the session level but still you can work it around.

    As an example:

    SELECT
      date,
      CASE
        WHEN REGEXP_CONTAINS(h.sourcePropertyInfo.sourcePropertyTrackingId, r'82272640') THEN 'MUG'
        WHEN h.sourcePropertyInfo.sourcePropertyTrackingId = 'Social' THEN 'Social'ELSE 'Website'
      END AS Property,
      geoNetwork.country AS Country,
      COUNT(DISTINCT CONCAT(CAST(visitId AS STRING),fullVisitorId)) AS visits,
      COUNT(DISTINCT(fullVisitorId)) AS Users,
      h.sourcePropertyInfo.sourcePropertyDisplayName AS display,
      SUM(CASE
          WHEN REGEXP_CONTAINS(h.page.pagepath, r'/') THEN h.latencyTracking.pageLoadTime END) / SUM(CASE
          WHEN REGEXP_CONTAINS(h.page.pagepath, r'/') THEN h.latencyTracking.pageLoadSample END) AS pageloadspeed,
      COUNT(DISTINCT
        CASE
          WHEN totals.newVisits = 1 THEN CONCAT(CAST(visitId AS STRING),fullVisitorId) END) new_visits,
      COUNT(CASE
             WHEN h.type = 'PAGE' THEN h.page.pagepath END) pageviews,
      SUM(CASE
           WHEN (h.isentrance = TRUE AND h.isexit = TRUE) THEN 1 END) bounces,
      COUNT(DISTINCT (CASE
            WHEN device.isMobile = TRUE THEN CONCAT(CAST(visitId AS STRING),fullVisitorId) END)) mobilevisits,
      COUNT(DISTINCT (CASE
            WHEN trafficSource.medium = 'organic' THEN CONCAT(CAST(visitId AS STRING),fullVisitorId) END)) organicvisits,
      SUM(CASE
           WHEN REGEXP_CONTAINS(h.eventInfo.eventAction,'register$|registersuccess|new registration|account signup|registro') THEN 1 END) AS NewRegistrations,
      SUM(CASE
           WHEN REGEXP_CONTAINS(h.eventInfo.eventAction, 'add to cart|add to bag|click to buy|ass to basket|comprar|addtobasket::') THEN 1 END) AS ClickToBuy,
      COUNT(h.transaction.transactionid) transactions
    FROM
      `project_id.dataset_id.ga_sessions_*`,
      UNNEST(hits) AS h
    WHERE
      1 = 1
      AND PARSE_TIMESTAMP('%Y%m%d', REGEXP_EXTRACT(_table_suffix, r'.*_(.*)')) BETWEEN TIMESTAMP('2017-05-01') AND TIMESTAMP('2017-05-01')
    GROUP BY
      date,
      Country,
      display,
      Property
    

    I ran it against our dataset and it seems to be working. Some changes I did:

    • Removed the MAX operation for the Property and added it to the group by.
    • pageviews was considered as the count of hits where hit.type = 'PAGE'. Not sure if this is the same for screenviews though.
    • bounce is computed when there's an entrance and exit event.
    • Total transactions is a count on transaction ids (hopefully this field is being filled in your dataset as well).
    0 讨论(0)
  • 2020-12-22 12:36

    after UNNESTing an array, use MAX() instead SUMMing them up to report on session level matrices:

    use:

    • MAX(totals.screenviews) AS PAGEVIEWS,
    • MAX(totals.bounces) AS BOUNCES,
    • MAX(totals.transactions) AS Transactions
    • ...
    • ...

    instead of:

    • SUM(totals.screenviews) AS PAGEVIEWS,
    • SUM(totals.bounces) AS BOUNCES,
    • SUM(totals.transactions) AS Transactions

    this should partially solve your issue. Let me know how it goes?

    0 讨论(0)
  • 2020-12-22 12:37

    I think the reason why the calculated bounce rate is much higher in the query from William Fuks is the following

    WHEN (h.isentrance = TRUE AND h.isexit = TRUE) THEN 1 END) bounces

    It seems that isEntrance and isExit only happens on PAGE hits and so does not account for events. Hence the over count of bounces is due to single page views that might have had one or more interaction events happen on the page.

    0 讨论(0)
提交回复
热议问题