How to transpose rows to columns with large amount of the data in BigQuery/SQL?

回眸只為那壹抹淺笑 提交于 2019-11-26 11:39:22

问题


I have a problem in transposing a large amount of data table in BigQuery (1.5 billion rows) from rows to columns. I could figure out how to do it with small amount of data when hardcoded, but with this large amount. A snapshot of the table looks like this:

+--------------------------+ | CustomerID Feature Value | +--------------------------+ | 1 A123 3 | | 1 F213 7 | | 1 F231 8 | | 1 B789 9.1 | | 2 A123 4 | | 2 U123 4 | | 2 B789 12 | | .. .. .. | | .. .. .. | | 400000 A123 8 | | 400000 U123 7 | | 400000 R231 6 | +--------------------------+

So basically there are approximately 400,000 distinct customerID with 3000 features, and not every customerID has the same features, so some customerID may have 2000 features while some have 3000. The end result table I would like to get is each row presents one distinct customerID, and with 3000 columns that presents all the features. Like this:

CustomerID Feature1 Feature2 ... Feature3000

So some of the cells may have missing values.

Anyone has idea how to do this in BigQuery or SQL?

Thanks in advance.


回答1:


STEP #1

In below query replace yourTable with real name of your table and execute/run it

SELECT 'SELECT CustomerID, ' + 
   GROUP_CONCAT_UNQUOTED(
      'MAX(IF(Feature = "' + STRING(Feature) + '", Value, NULL))'
   ) 
   + ' FROM yourTable GROUP BY CustomerID'
FROM (SELECT Feature FROM yourTable GROUP BY Feature) 

As a result you will get some string to be used in next step!

STEP #2

Take string you got from Step 1 and just execute it as a query
The output is a Pivot you asked in question




回答2:


Hi @Jade I posted a very similar question before. And got a very helpful (and similar) answer from @MikhailBerlyant. For what it's worth, I had about 4000 features to dummify in my case and also ran into "Resources exceeded during query execution" error.

I think that this type of large-scale data transformation (rather than query) is better left for other tools more suitable for this task (such as Spark).



来源:https://stackoverflow.com/questions/34798244/how-to-transpose-rows-to-columns-with-large-amount-of-the-data-in-bigquery-sql

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!