How to use NOT IN in Hive

不羁岁月 提交于 2021-02-06 04:37:27


Suppose I have 2 tables as shown below. Now, if I want to achieve result which sql will give using, insert into B where id not in(select id from A) which will insert 3 George in Table B.

How to implement this in hive?

Table A

id  name      
1   Rahul     
2   Keshav    
3   George

Table B

id  name      
1   Rahul     
2   Keshav    
4   Yogesh   


NOT IN in the WHERE clause with uncorrelated subqueries is supported since Hive 0.13 which was released more than 3 years ago, on 21 April, 2014.

select * from A where id not in (select id from B where id is not null);

| id |  name  |
|  3 | George |

On earlier versions the column of the outer table should be qualified with the table name/alias.

hive> select * from A where id not in (select id from B where id is not null);
FAILED: SemanticException [Error 10249]: Line 1:22 Unsupported SubQuery Expression 'id': Correlating expression cannot contain unqualified column references.

hive> select * from A where not in (select id from B where id is not null);
3   George

When using NOT IN you should add is not null to the inner query, unless you are 100% sure that the relevant column does not contain null values.
One null value is enough to cause your query to return no results.

