How to deduplicate in Presto

落花浮王杯 提交于 2019-12-24 05:14:14

问题


I have a Presto table assume it has [id, name, update_time] columns and data

(1, Amy, 2018-08-01),
(1, Amy, 2018-08-02),
(1, Amyyyyyyy, 2018-08-03),
(2, Bob, 2018-08-01)

Now, I want to execute a sql and the result will be

(1, Amyyyyyyy, 2018-08-03),
(2, Bob, 2018-08-01)

Currently, my best way to deduplicate in Presto is below.

select 
    t1.id, 
    t1.name,
    t1.update_time 
from table_name t1
join (select id, max(update_time) as update_time from table_name group by id) t2
    on t1.id = t2.id and t1.update_time = t2.update_time

More information, clike deduplication in sql

Is there a better way to deduplicate in Presto?


回答1:


In PrestoDB, I would be inclined to use row_number():

select id, name, date
from (select t.*,
             row_number() over (partition by name order by date desc) as seqnum
      from table_name t
     ) t
where seqnum = 1;



回答2:


You seems want subquery :

select t.*
from table t
where update_time = (select MAX(t1.update_time) from table t1 where t1.id = t.id);



回答3:


just use in operator

 select t.*
    from tableA t
    where update_time in (select MAX(tableA.update_time) from tableA goup by id)



回答4:


It's easy:

Select id, name, MAX(update_time) as [Last Update] from table_name Group by id

Hope it helps



来源:https://stackoverflow.com/questions/51630164/how-to-deduplicate-in-presto

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!