Pick a random attribute from group in Redshift

元气小坏坏 提交于 2019-12-23 04:07:09

问题


I have a data set in the form.

id  |   attribute
-----------------
1   |   a
2   |   b
2   |   a
2   |   a
3   |   c

Desired output:

attribute|  num
-------------------
a        |  1
b,a      |  1
c        |  1

In MySQL, I would use:

select attribute, count(*) num 
from 
   (select id, group_concat(distinct attribute) attribute from dataset group by id) as     subquery 
group by attribute;

I am not sure this can be done in Redshift because it does not support group_concat or any psql group aggregate functions like array_agg() or string_agg(). See this question.

An alternate solution that would work is if there was a way for me to pick a random attribute from each group instead of group_concat. How can this work in Redshift?


回答1:


I found a way to pick up a random attribute for each id, but it's too tricky. Actually I don't think it's a good way, but it works.

SQL:

-- (1) uniq dataset 
WITH uniq_dataset as (select * from dataset group by id, attr)
SELECT 
  uds.id, rds.attr
FROM
-- (2) generate random rank for each id
  (select id, round((random() * ((select count(*) from uniq_dataset iuds where iuds.id = ouds.id) - 1))::numeric, 0) + 1 as random_rk from (select distinct id from uniq_dataset) ouds) uds,
-- (3) rank table
  (select rank() over(partition by id order by attr) as rk, id ,attr from uniq_dataset) rds
WHERE
  uds.id = rds.id
AND 
  uds.random_rk = rds.rk
ORDER BY
  uds.id;

Result:

 id | attr
----+------
  1 | a
  2 | a
  3 | c

OR

 id | attr
----+------
  1 | a
  2 | b
  3 | c

Here are tables in this SQL.

-- dataset (original table)
 id | attr
----+------
  1 | a
  2 | b
  2 | a
  2 | a
  3 | c

-- (1) uniq dataset
 id | attr
----+------
  1 | a
  2 | a
  2 | b
  3 | c

-- (2) generate random rank for each id
 id | random_rk
----+----
  1 |  1
  2 |  1 <- 1 or 2
  3 |  1

-- (3) rank table
 rk | id | attr
----+----+------
  1 |  1 | a
  1 |  2 | a
  2 |  2 | b
  1 |  3 | c



回答2:


This solution, inspired by Masashi, is simpler and accomplishes selecting a random element from a group in Redshift.

SELECT id, first_value as attribute 
FROM(SELECT id, FIRST_VALUE(attribute) 
    OVER(PARTITION BY id ORDER BY random() 
    ROWS BETWEEN unbounded preceding AND unbounded following) 
    FROM dataset) 
GROUP BY id, attribute ORDER BY id;



回答3:


This is an answer for the related question here. That question is closed, so I am posting the answer here.

Here is a method to aggregate a column into a string:

select * from temp;
 attribute 
-----------
 a
 c
 b

1) Give a unique rank to each row

with sub_table as(select attribute, rank() over (order by attribute) rnk from temp)
select * from sub_table;

 attribute | rnk 
-----------+-----
 a         |   1
 b         |   2
 c         |   3

2) Use concat operator || to combine in one line

with sub_table as(select attribute, rank() over (order by attribute) rnk from temp)
select (select attribute from sub_table where rnk = 1)||
       (select attribute from sub_table where rnk = 2)||
       (select attribute from sub_table where rnk = 3) res_string;

 res_string 
------------
 abc

This only works for a finite numbers of rows (X) in that column. It can be the first X rows ordered by some attribute in the "order by" clause. I'm guessing this is expensive.

Case statement can be used to deal with NULLs which occur when a certain rank does not exist.

with sub_table as(select attribute, rank() over (order by attribute) rnk from temp)
select (select attribute from sub_table where rnk = 1)||
       (select attribute from sub_table where rnk = 2)||
       (select attribute from sub_table where rnk = 3)||
       (case when (select attribute from sub_table where rnk = 4) is NULL then '' 
             else (select attribute from sub_table where rnk = 4) end) as res_string;



回答4:


I haven't tested this query, but these functions are supported in Redshift:

select id, arrary_to_string(array(select attribute from mydataset m where m.id=d.id),',') from mydataset d



来源:https://stackoverflow.com/questions/21084913/amazon-redshift-mechanism-for-aggregating-a-column-into-a-string

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!