问题
I was trying to parameterise the value in hive rather than hard coding it in query. Below is the query.
select * from employee where sal >30000
But rather than using 30000 value as hard coded I need that to come from the same query like below. But I am running into issues :
select * from employee where sal > (select max(sal) from employee)
Any help is appreciated.
Thanks
回答1:
You can try using this form of Hive query. This will get the employees having salary equal to the highest salary.
SELECT e1.* FROM employee e1 
JOIN
(SELECT MAX(sal) as max_sal FROM employee) e2
ON e1.sal = e2.max_sal;
Example:
Table: employee
id  fname   sal
1   AAA     15000
2   BBB     35000
3   CCC     12000
4   DDD     35000
5   EEE     9000
Query execution output:
2   BBB     35000
4   DDD     35000
回答2:
Hive does not support such subqueries and also does not allow to calculate a variable, variables in Hive are simple text substitution without calculation. You can calculate predicate in a shell and pass to your hive script like in this answer: https://stackoverflow.com/a/37821218/2700344
If you want to do it in the same hive query, nothing wrong in calculating subquery and do a cross join with it's result, then filter. subquery will be calculated first, then it's result placed in the distributed cache and applied in the filter in each mapper reading the table:
with sub as(--this is example only and makes no sense
            --replace with real query
            --of course there is no rows with sal>max sal in the same table
select max(S.sal) AS MaxSal from employee S  
)
select * 
  from employee e 
       cross join sub s  
where e.sal>s.MaxSal 
If you write it without CROSS JOIN, simply from employee e, sub s, or JOIN without on condition, it is still the same cross join, better write it explicitly using cross join. 
来源:https://stackoverflow.com/questions/56447957/get-the-value-from-subquery-in-hive