Find Top 1000 entries along with count and rank from table

问题

I have a table with around 30 billions rows in Redshift with following structure,

userid    itemid   country   start_date
uid1     itemid1  country1  2018-07-25 00:00:00
uid2     itemid2  country1  2018-07-25 00:00:00
uid3     itemid1  country2  2018-07-25 00:00:00
uid4     itemid3  country1  2018-07-25 00:00:00
uid5     itemid1  country1  2018-07-25 00:00:00
uid1     itemid2  country2  2018-07-25 00:00:00
uid2     itemid2  country2  2018-07-25 00:00:00

Here, I want to find item's are bought by how many unique users and then pick top 1000 most sold item for each country and start_date. Here, both rank and number of times item sold is required.

Following output is expected

itemid     country   sold_count   start_date
itemid1    country1   2           2018-07-25 00:00:00
itemid2    country2   2           2018-07-25 00:00:00
itemid1    country2   1           2018-07-25 00:00:00
itemid2    country1   1           2018-07-25 00:00:00
itemid3    country1   1           2018-07-25 00:00:00

I am trying to implement rank function but I am not getting expected result.

I am trying following query,

  select itemid, start_date, Rank() over (partition by itemid order by 
  count(distinct(userid)) desc) as rank1
  from table_name 
  group by item_id, start_date
  order by rank1 desc;

Also, I want to have a column for count of unqiue userid bought item_id group by country and start_date. In the above query, I have ignored country column to simplify the query.

Please help me.

回答1:

If I assume that "version" means "country", then I think you want:

select *
from (select itemid, country, start_date, count(distinct userid) as num_users,
             row_number() over (partition by country, start_date 
                                order by count(distinct userid) desc
                               ) as seqnum
      from table_name 
      group by item_id, country, start_date
     ) x
where seqnum <= 1000

回答2:

 select itemid, country, sold_count, start_date
 from (select itemid, start_date, count(*) as scount
 from table_name
 group by itemid, start_date 
 order by scount desc
 limit 1000) tab,
 (select itemid, country, count(*) sold_count
  from table_name
  group by itemid, country) tab1
  where tab.itemid = tab1.itemid

回答3:

as it says in your question, you want "to find item's are bought by how many unique users and then pick top 1000 most sold item for each country and start_date", so you can try to do exactly this step by step with CTEs, instead of writing a single query:

with 
 items_by_country as (
    select 
     itemid
    ,country
    ,count(distinct userid)
    ,min(start_date) as start_date
    from table_name
    group by 1,2
)
,ranked_groups as (
    select 
     *
    ,row_number() over (partition by country order by count desc)
    from items_by_country
)
select *
from ranked_groups
where row_number<=1000
order by 1,2,3 desc
;

来源：https://stackoverflow.com/questions/51579015/find-top-1000-entries-along-with-count-and-rank-from-table

标签

sql

amazon-redshift