how to join tables by comparing two fields, and also considering performance

允我心安 提交于 2020-01-14 02:29:23

问题


This should be simple but I can't get it. I need to do a select to get newer date values for some accounts.

I start here, T1:

+----------+---------+
|  date   | account |
+----------+---------+
| 4/1/2018 |       1 |
| 4/1/2018 |       2 |
| 4/1/2018 |       3 |
| 4/1/2018 |       4 |
| 4/1/2018 |       5 |
+----------+---------+

Then some dates are updated in T2:

+----------+---------+
|   date   | account |
+----------+---------+
| 7/1/2018 |       1 |
| 7/1/2018 |       2 |
+----------+---------+

How can I get this output into T3, updating just those accounts?

+----------+---------+
|   date   | account |
+----------+---------+
| 7/1/2018 |       1 |
| 7/1/2018 |       2 |
| 4/1/2018 |       3 |
| 4/1/2018 |       4 |
| 4/1/2018 |       5 |
+----------+---------+

I can do a join on the account number, but what about the ones that didn't change? How to capture those?

Also, T1 has about 8 million records, so performance would be a factor. Extracting from Teradata, loading into Hive.

Thanks!


回答1:


I think you want:

select t2.*
from t2
union all
select t1.*
from t1
where not exists (select 1 from t2 where t2.account = t1.account);

This selects first from t2. Then it takes remaining accounts from t1.




回答2:


Just an addition to previously good answers.. try using it with coalesce also and let me know if it improves performance.

select t1.Account, coalesce(t2.Date, t1.Date) 
from t1
left outer join t2
  on t2.Account = t1.Account



回答3:


Here is another solution with left outer join:

select t1.Account, case when t2.Date is null then t1.Date else t2.Date end
from t1
left outer join t2 on t2.Account = t1.Account


来源:https://stackoverflow.com/questions/53015165/how-to-join-tables-by-comparing-two-fields-and-also-considering-performance

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!