How to obtain the most recent row per type and perform calculations, depending on the row type?

谁说胖子不能爱 提交于 2019-12-05 21:30:45

How can this query be optimized?

Try below version

#standardSQL
WITH types AS (
  SELECT 
    FORMAT_TIMESTAMP('%Y-%m-%d', sent_at) AS sent_at,
    message_id,
    FIRST_VALUE(status) OVER(PARTITION BY message_id ORDER BY (event_type = "create") DESC, event_timestamp DESC) AS submitted_status,
    FIRST_VALUE(status) OVER(PARTITION BY message_id ORDER BY (event_type = "status_update") DESC, event_timestamp DESC) AS delivered_status,
    FIRST_VALUE(rate) OVER(PARTITION BY message_id ORDER BY (event_type IN ("rate_update", "create")) DESC, event_timestamp DESC) AS sales_rate
  FROM events
), latest AS (
  SELECT 
    sent_at,
    message_id,
    ANY_VALUE(IF(submitted_status=0,1,0)) AS submitted,  
    ANY_VALUE(IF(delivered_status=1,1,0)) AS delivered,  
    ANY_VALUE(sales_rate) AS sales_rate
  FROM types
  GROUP BY 1, 2
)
SELECT   
  sent_at,
  SUM(submitted) AS submitted,  
  SUM(delivered) AS delivered,  
  SUM(sales_rate) AS sales_rate_total        
FROM latest
GROUP BY 1

It's compact enough to easily manage, no redundancy, no joins at all, etc.
If your table partitioned - you can easily use it by adjusting query just in one place

You can use below dummy data if want to check above query on low volume first

WITH events AS (
  SELECT 1 AS id, 'create' AS event_type, TIMESTAMP '2016-11-25 09:17:48' AS event_timestamp, 1 AS message_id, TIMESTAMP '2016-11-25 09:17:48' AS sent_at, 0 AS status, 0.500000 AS rate UNION ALL
  SELECT 2 AS id, 'status_update' AS event_type, TIMESTAMP '2016-11-25 09:24:38' AS event_timestamp, 1 AS message_id, TIMESTAMP '2016-11-25 09:28:49' AS sent_at, 1 AS status, 0.500000 AS rate UNION ALL
  SELECT 3 AS id, 'create' AS event_type, TIMESTAMP '2016-11-25 09:47:48' AS event_timestamp, 2 AS message_id, TIMESTAMP '2016-11-25 09:47:48' AS sent_at, 0 AS status, 0.500000 AS rate UNION ALL
  SELECT 4 AS id, 'status_update' AS event_type, TIMESTAMP '2016-11-25 09:54:38' AS event_timestamp, 2 AS message_id, TIMESTAMP '2016-11-25 09:48:49' AS sent_at, 1 AS status, 0.500000 AS rate UNION ALL
  SELECT 5 AS id, 'rate_update' AS event_type, TIMESTAMP '2016-11-25 09:55:07' AS event_timestamp, 2 AS message_id, TIMESTAMP '2016-11-25 09:50:07' AS sent_at, 0 AS status, 1.000000 AS rate UNION ALL
  SELECT 6 AS id, 'create' AS event_type, TIMESTAMP '2016-11-26 09:17:48' AS event_timestamp, 3 AS message_id, TIMESTAMP '2016-11-26 09:17:48' AS sent_at, 0 AS status, 0.500000 AS rate UNION ALL
  SELECT 7 AS id, 'create' AS event_type, TIMESTAMP '2016-11-27 09:17:48' AS event_timestamp, 4 AS message_id, TIMESTAMP '2016-11-27 09:17:48' AS sent_at, 0 AS status, 0.500000 AS rate UNION ALL
  SELECT 8 AS id, 'rate_update' AS event_type, TIMESTAMP '2016-11-27 09:55:07' AS event_timestamp, 4 AS message_id, TIMESTAMP '2016-11-27 09:50:07' AS sent_at, 0 AS status, 2.000000 AS rate UNION ALL
  SELECT 9 AS id, 'rate_update' AS event_type, TIMESTAMP '2016-11-27 09:55:07' AS event_timestamp, 2 AS message_id, TIMESTAMP '2016-11-25 09:55:07' AS sent_at, 0 AS status, 2.000000 AS rate 
)

For every table that holds multiple events and where we need to pick the latest we have a view in place.

View: user_profile_latest

SELECT * from (
  select rank() over (partition by user_id order by bq.created DESC, bq.insert_id  desc) as _rank,
*
FROM [user_profile_event]
) where _rank=1

We maintain a record BQ with created and insert_id for deduplication purposes.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!