Write a nested select statement with a where clause in Hive

若如初见. 提交于 2019-12-07 08:15:24

Subqueries inside a WHERE clause are not supported in Hive: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SubQueries

However, often you can use a JOIN statement instead to get to the same result: https://karmasphere.com/hive-queries-on-table-data#join_syntax

For example, this query:

   SELECT a.KEY, a.value
   FROM a
   WHERE a.KEY IN
   (SELECT b.KEY FROM B);

can be rewritten to:

   SELECT a.KEY, a.val
   FROM a LEFT SEMI JOIN b ON (a.KEY = b.KEY)

Looking at the business requirements underlying your question, it occurs that you might get more efficient results by partitioning your Hive table using hour. If the data can be written to use this factor as the partition key, then your query to update the summary will be much faster and require fewer resources.

Partitions can get out of hand when they reach the scale of millions, but this seems like a case that will not tease that limitation.

It will work if you put in :

select * from TableA where TA_timestamp in (select timestmp from TableB where id="hourDim")

  • EXPLANATION : As > , < , = need one exact figure in the right side, while here we are getting multiple values which can be taken only with 'IN' clause.
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!