BigQuery GROUP_CONCAT and ORDER BY

前端 未结 3 1935
被撕碎了的回忆
被撕碎了的回忆 2020-12-10 19:14

I am currently using BigQuery and GROUP_CONCAT which works perfectly fine. However, when I try to add a ORDER BY clause to the GROUP_CONCAT statement like I would do in SQL,

相关标签:
3条回答
  • 2020-12-10 19:23

    Since BigQuery doesn't support ORDER BY clause inside GROUP_CONCAT function, this functionality can be achieved by use of analytic window functions. And in BigQuery separator for GROUP_CONCAT is simply a second parameter for the function. Below example illustrates this:

    select key, first(grouped_value) concat_value from (
    select 
      key, 
      group_concat(value, ':') over 
        (partition by key
         order by value asc
         rows between unbounded preceding and unbounded following) 
      grouped_value 
    from (
    select key, value from
    (select 1 as key, 'b' as value),
    (select 1 as key, 'c' as value),
    (select 1 as key, 'a' as value),
    (select 2 as key, 'y' as value),
    (select 2 as key, 'x' as value))) group by key
    

    Will produce the following:

    Row key concat_value     
    1   1   a:b:c    
    2   2   x:y
    

    NOTE on Window specification: The query uses "rows between unbounded preceding and unbounded following" window specification, to make sure that all rows within a partition participate in GROUP_CONCAT aggregation. Per SQL Standard default window specification is "rows between unbounded preceding and current row" which is good for things like running sum, but won't work correctly in this problem.

    Performance note: Even though it looks wasteful to recompute aggregation function multiple times, the BigQuery optimizer does recognize that since window is not changing result will be the same, so it only computes aggregation once per partition.

    0 讨论(0)
  • 2020-12-10 19:28

    Here is a version in Standard SQL mode in BigQuery with ARRAY_AGG as aggregate function:

    select key,
    array_agg(struct(grouped_value) order by array_length(grouped_value) desc limit 1)[offset(0)].*
    from (
    select 
      key, 
      array_agg(value) over 
        (partition by key
         order by value asc
         rows between unbounded preceding and unbounded following) 
      grouped_value 
    from (
    select key, value from unnest([
        struct(1 as key, "b" as value)
      , struct(1, "c")
      , struct(1, "a")
      , struct(2, "y")
      , struct(2, "x")
    ]))) group by key
    
    0 讨论(0)
  • 2020-12-10 19:36

    Standard SQL mode in BigQuery does support ORDER BY clause within some aggregate functions, including STRING_AGG, for example:

    #standardSQL
    select string_agg(t.x order by t.y) 
    from unnest([struct<x STRING, y INT64>('a', 5), ('b', 1), ('c', 10)]) t
    

    will result in

    b,a,c

    Documentation is here: https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#using-order-by-with-aggregate-functions

    0 讨论(0)
提交回复
热议问题