问题
I have a Presto table assume it has [id, name, update_time] columns and data
(1, Amy, 2018-08-01),
(1, Amy, 2018-08-02),
(1, Amyyyyyyy, 2018-08-03),
(2, Bob, 2018-08-01)
Now, I want to execute a sql and the result will be
(1, Amyyyyyyy, 2018-08-03),
(2, Bob, 2018-08-01)
Currently, my best way to deduplicate in Presto is below.
select
t1.id,
t1.name,
t1.update_time
from table_name t1
join (select id, max(update_time) as update_time from table_name group by id) t2
on t1.id = t2.id and t1.update_time = t2.update_time
More information, clike deduplication in sql
Is there a better way to deduplicate in Presto?
回答1:
In PrestoDB, I would be inclined to use row_number()
:
select id, name, date
from (select t.*,
row_number() over (partition by name order by date desc) as seqnum
from table_name t
) t
where seqnum = 1;
回答2:
You seems want subquery
:
select t.*
from table t
where update_time = (select MAX(t1.update_time) from table t1 where t1.id = t.id);
回答3:
just use in
operator
select t.*
from tableA t
where update_time in (select MAX(tableA.update_time) from tableA goup by id)
回答4:
It's easy:
Select id, name, MAX(update_time) as [Last Update] from table_name Group by id
Hope it helps
来源:https://stackoverflow.com/questions/51630164/how-to-deduplicate-in-presto