Hive: Is there a way to get the aggregates of all the numeric columns existing in a table?

走远了吗. 提交于 2021-01-29 14:37:44

问题


I have a table containing over 50 columns (both numeric and char), is there a way to get the overall statistics without specifying each column?

As an example:

a b c d 1 2 3 4 5 6 7 8 9 10 11 12

Ideally I would have something like:

column_name min avg max sum a 1 5 9 15 b 2 6 10 18 c 3 7 11 21 d 4 8 12 24

Nevertheless, getting one aggregate at a time it would be more more than helpful.

Any help/idea would be highly appreciated.

Thank you,
O


回答1:


You can parse DESCRIBE TABLE output using AWK and generate comma separated string of SUM(col) as sum_col for numeric columns and column_list for all other columns. In this example it generates select statement with goup by. Run in shell:

TABLE_NAME=your_schema.your_table

NUMERIC_COLUMNS=$(hive -S -e "set hive.cli.print.header=false; describe ${TABLE_NAME};" | awk -F " " 'f&&!NF{exit}{f=1}f{ if($2=="int"||$2=="double") printf c "sum("toupper($1)") as sum_"$1}{c=","}')

GROUP_BY_COLUMNS=$(hive -S -e "set hive.cli.print.header=false; describe ${TABLE_NAME};" | awk -F " " 'f&&!NF{exit}{f=1}f{if($2!="int"&&$2!="double") printf c toupper($1)}{c=","}')

SELECT_STATEMENT="select $NUMERIC_COLUMNS $GROUP_BY_COLUMNS from $TABLE_NAME group by $GROUP_BY_COLUMNS"

I'm checking only int and double columns. You add more types. Also you can optimize it and execute DESCRIBE only once, then parse result using same AWK scripts. Hope you got the idea.



来源:https://stackoverflow.com/questions/58008031/hive-is-there-a-way-to-get-the-aggregates-of-all-the-numeric-columns-existing-i

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!