Is it possible to concat a string field after group by in Hive

我只是一个虾纸丫 提交于 2019-12-12 10:37:29

问题


I am evaluating Hive and need to do some string field concatenation after group by. I found a function named "concat_ws" but it looks like I have to explicitly list all the values to be concatenated. I am wondering if I can do something like this with concat_ws in Hive. Here is an example. So I have a table named "my_table" and it has two fields named country and city. I want to have only one record per country and each record will have two fields - country and cities:

select country, concat_ws(city, "|") as cities
from my_table
group by country

Is this possible in Hive? I am using Hive 0.11 from CDH5 right now


回答1:


In database management an aggregate function is a function where the values of multiple rows are grouped together as input on certain criteria to form a single value of more significant meaning or measurement such as a set, a bag or a list.

Source: Aggregate function - Wikipedia

Hive's out-of-the-box aggregate functions listed on the following web-page:
Built-in Aggregate Functions (UDAF - user defined aggregation function)

So, the only built-in option (for Hive 0.11; for Hive 0.13 and above you have collect_list) is:
array collect_set(col)

This one will answer your request in case there is no duplicate city records per country (returns a set of objects with duplicate elements eliminated). Otherwise create your own UDAF or aggregate outside of Hive.

References for writing UDAF:

  • Writing GenericUDAFs: A Tutorial
  • HivePlugins
  • Create/Drop Function


来源:https://stackoverflow.com/questions/30010573/is-it-possible-to-concat-a-string-field-after-group-by-in-hive

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!