How to aggregate multiple rows into one in BigQuery?

笑着哭i 提交于 2020-02-02 06:25:53

问题


Suppose you have a de-normalized schema with multiple rows like below:

   uuid    |    property    |    value   
------------------------------------------
  abc      |   first_name   |  John
  abc      |   last_name    |  Connor
  abc      |   age          |  26
...

The same set of properties for all rows, not necessarily sorted. How to create a table such as using BigQuery (i.e. no client):

Table user_properties:

   uuid    |    first_name  |    last_name   |    age
 --------------------------------------------------------
  abc      |   John         |    Connor      |    26

In traditional SQL there is the "STUFF" keyword for this purpose.

It would be easier if I could at least get the results ORDERED by uuid so the client would not need to load the whole table (4GB) to sort -- it would be possible to hydrate each entity by scanning sequentially the rows with same uuid. However, a query like this:

SELECT * FROM user_properties ORDER BY uuid; 

exceeds the available resources in BigQuery (using allowLargeResults forbids ORDER BY). It almost seems like I cannot sort a large table (4GB) in BigQuery unless I subscribe to a high end machine. Any ideas?


回答1:


SELECT 
  uuid,
  MAX(IF(property = 'first_name', value, NULL)) AS first_name,
  MAX(IF(property = 'last_name', value, NULL)) AS last_name,
  MAX(IF(property = 'age', value, NULL)) AS age
FROM user_properties
GROUP BY uuid

Another option - no GROUP'ing involved

SELECT uuid, first_name, last_name, age  
FROM (
  SELECT 
    uuid,
    LEAD(value, 1) OVER(PARTITION BY uuid ORDER BY property) AS first_name,
    LEAD(value, 2) OVER(PARTITION BY uuid ORDER BY property) AS last_name,
    value AS age,
    property = 'age' AS anchor
  FROM user_properties
)
HAVING anchor


来源:https://stackoverflow.com/questions/35789156/how-to-aggregate-multiple-rows-into-one-in-bigquery

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!