RANK or ROW_NUMBER in BigQuery over a large dataset

蓝咒 提交于 2019-11-28 12:36:57

You didn't give me a working query, so I had to create my own, so you'll need to translate it to your own problem space. Also I'm not sure why do you want to give a row number to each row in such a huge dataset, but challenge accepted:

SELECT a.enc, plarf, plarf+COALESCE(INTEGER(sumc), (0)) row_num
FROM (
  SELECT STRING(year)+STRING(month)+STRING(mother_age)+state enc, 
         ROW_NUMBER() OVER (PARTITION BY year ORDER BY enc) plarf,
         year
  FROM [publicdata:samples.natality] ) a
LEFT JOIN (
  SELECT COUNT(*) c, year+1 year, SUM(c) OVER(ORDER BY year) sumc
  FROM [publicdata:samples.natality] 
  GROUP BY year
) b
ON a.year=b.year
  • I want to do a ROW_NUMBER() OVER(), but I can't because there are too many elements.
  • Having an OVER(PARTITION) fixes this issue, but now each partition starts with 1.
  • But that's OK. On another subquery I will count how many elements are there in each partition.
  • And the surrounding query will take the row_number of each partition, and add it to the local-to-the-partition count.
  • Ta da.
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!