Error in Hive : Underlying error: org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException: One or more arguments are expected

ⅰ亾dé卋堺 提交于 2019-12-05 23:23:44

I have run into the same error. rank() is case sensitive in hive and the error message give nothing away. Try changing RANK() to rank().

My guess is that it has to do with the coalesce inside your rank. Analytic functions work but are more limited in HiveQL. I would try all your joins and sums in an inner query and then do the rank in an outer query. Often times this is required as HiveQL does not always follow the same order of operations you would expect from a typical SQL language. Consider a table based on stock information:

select count(*) as COUNT
from NYSE_STOCKS
where date in ('2001-12-20','2001-12-21','2001-12-24') and exchange = 'NYSE';

Now consider the following query:

select 
  exchange
  , date
  , count(*) over (partition by exchange) 
from NYSE_STOCKS 
where date in ('2001-12-20','2001-12-21','2001-12-24') 
group by exchange, date;

You would expect the following results:

EXCHANGE | DATE       | COUNT
NYSE     | 2001-12-20 | 5199
NYSE     | 2001-12-21 | 5199
NYSE     | 2001-12-24 | 5199 

But you would actually get this in HiveQL:

EXCHANGE | DATE       | COUNT
NYSE     | 2001-12-20 | 3
NYSE     | 2001-12-21 | 3
NYSE     | 2001-12-24 | 3

To get the correct results you have to do the group by in an inner query and the analytic function in the outer query:

select 
  exchange
  , date
  , count
from (
  select 
        exchange
        , date
        , count(*) over (partition by exchange) as count
  from NYSE_STOCKS 
  where date in ('2001-12-20','2001-12-21','2001-12-24') 
) A
group by exchange, date, count
;

So in summary its always good to think about order of operations when using analytic functions and get the data you are working with to its simplest form before you use the analytic function.

Funny enough, I actually hit this same error today. The problem for me was that one of the columns I was using in my analytic function was not a valid column. W/O knowing what columns your tables provide its impossible for me to prove this is your problem, but you may want to make sure all the columns in your RANK are valid.

Does not look like a valid "Hive" query to me. Remember hive's query language is pretty limited compared to SQL. For example "IN" is not supported. Another exmaple RANK() OVER (...) - that's not supported either. In other words attempting to use RDBMS SQL directly in Hive mostly not work.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!