问题
I have a table with around 30 billions rows in Redshift with following structure,
userid itemid country start_date
uid1 itemid1 country1 2018-07-25 00:00:00
uid2 itemid2 country1 2018-07-25 00:00:00
uid3 itemid1 country2 2018-07-25 00:00:00
uid4 itemid3 country1 2018-07-25 00:00:00
uid5 itemid1 country1 2018-07-25 00:00:00
uid1 itemid2 country2 2018-07-25 00:00:00
uid2 itemid2 country2 2018-07-25 00:00:00
Here, I want to find item's are bought by how many unique users and then pick top 1000 most sold item for each country and start_date. Here, both rank and number of times item sold is required.
Following output is expected
itemid country sold_count start_date
itemid1 country1 2 2018-07-25 00:00:00
itemid2 country2 2 2018-07-25 00:00:00
itemid1 country2 1 2018-07-25 00:00:00
itemid2 country1 1 2018-07-25 00:00:00
itemid3 country1 1 2018-07-25 00:00:00
I am trying to implement rank function but I am not getting expected result.
I am trying following query,
select itemid, start_date, Rank() over (partition by itemid order by
count(distinct(userid)) desc) as rank1
from table_name
group by item_id, start_date
order by rank1 desc;
Also, I want to have a column for count of unqiue userid bought item_id group by country and start_date. In the above query, I have ignored country column to simplify the query.
Please help me.
回答1:
If I assume that "version" means "country", then I think you want:
select *
from (select itemid, country, start_date, count(distinct userid) as num_users,
row_number() over (partition by country, start_date
order by count(distinct userid) desc
) as seqnum
from table_name
group by item_id, country, start_date
) x
where seqnum <= 1000
回答2:
select itemid, country, sold_count, start_date
from (select itemid, start_date, count(*) as scount
from table_name
group by itemid, start_date
order by scount desc
limit 1000) tab,
(select itemid, country, count(*) sold_count
from table_name
group by itemid, country) tab1
where tab.itemid = tab1.itemid
回答3:
as it says in your question, you want "to find item's are bought by how many unique users and then pick top 1000 most sold item for each country and start_date", so you can try to do exactly this step by step with CTEs, instead of writing a single query:
with
items_by_country as (
select
itemid
,country
,count(distinct userid)
,min(start_date) as start_date
from table_name
group by 1,2
)
,ranked_groups as (
select
*
,row_number() over (partition by country order by count desc)
from items_by_country
)
select *
from ranked_groups
where row_number<=1000
order by 1,2,3 desc
;
来源:https://stackoverflow.com/questions/51579015/find-top-1000-entries-along-with-count-and-rank-from-table