Hive: More clean way to SELECT AS and GROUP BY

妖精的绣舞 提交于 2020-05-25 06:44:19

问题


I try to write Hive Sql like that

SELECT count(1), substr(date, 1, 4) as year
FROM ***
GROUP BY year

But Hive cannot recognize the alias name 'year', it complains that: FAILED: SemanticException [Error 10004]: Line 1:79 Invalid table alias or column reference 'year'

One solution(Hive: SELECT AS and GROUP BY) suggest to use 'GROUP BY substr(date, 1, 4)'.

It works! However in some cases the value I want to group by may be generated from multiple lines of hive function code, it's very ugly to write code like

SELECT count(1), func1(func2(..........................)) AS something
FROM ***
GROUP BY func1(func2(..........................))

Is there any clean way in Hive to do that? Any suggestions?


回答1:


Specifying the position in Group By will solve your issue. This position number in Group By works even when SET hive.groupby.orderby.position.alias=false; (Hive 0.12)

SELECT count(1), substr(date, 1, 4) as year  
FROM ***
GROUP BY 2;



回答2:


In Hive 0.11.0 and later, columns can be specified by position if hive.groupby.orderby.position.alias is set to true (the default is false). So setting set hive.groupby.orderby.position.alias=true; in your .hql (or .hiverc for a permanent solution) will do the trick and then you can type group by 2 for the above example. Source: hive language manual




回答3:


One solution that comes to mind is put GROUP BY to outer query:

SELECT count(*) , year FROM 
(
   SELECT substr(date, 1, 4) as year FORM ***
) inner
GROUP BY year

GL!



来源:https://stackoverflow.com/questions/29443253/hive-more-clean-way-to-select-as-and-group-by

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!