Hive query: select a column based on the condition another columns values match some specific values, then create the match result as a new column

不问归期 提交于 2020-06-27 18:37:06

问题


I have to some query and creat columns operations in HiveQL.

For example,

app      col1

app1     anybody love me?
app2     I hate u
app3     this hat is good
app4     I don't like this one
app5     oh my god
app6     damn you.
app7     such nice girl
app8     xxxxx
app9     pretty prefect
app10    don't love me.
app11    xxx anybody?

I want to match a keyword list like ['anybody', 'love', 'you', 'xxx', 'don't'] and select the matched keyword result as a new column, named keyword as follows:

app      keyword

app1     anybody, love
app4     I don't like this one
app6     damn you.
app8     xxx
app10    don't, love
app11    xxx

It seems that I have to use nested query.
The logic is kind of like selecting the matched result rows and setting a matched results which should be saved in a list or something like this as a new column.

But I am not familiar enough with the HiveQL.
Could anyone help me?
Thanks in advances.


回答1:


You could turn the list of words to a table and join it with your table using pattern matching:

select t.app, k.keyword
from  mytable t
inner join (values ('anybody'), ('you'), ('xxx'), ('don''t')) as k(keyword)
    on t.col1 like conca('%', k.keyword, '%')

Note that this will duplicate app if more than one keyword matches on a phrase. You did not specify how you want to handle this use case.

In hive, you can also phrase this as:

select t.app, k.keyword
from  mytable t
inner join table(values 'anybody', 'you', 'xxx', 'don''t') as k(keyword)
    on t.col1 like conca('%', k.keyword, '%')



回答2:


In Hive you can use stack UDTF:

with keywords as (
select stack(4, --the number of tuples
'anybody', 'you', 'xxx', 'don\'t'
) as keyword
)

select t.app, k.keyword
from  mytable t
inner join keywords k
    on t.col1 like concat('%', k.keyword, '%')

Also for older versions of hive join using like will not work, use cross join with stack and filter in the WHERE:

from  mytable t
cross join keywords k
where t.col1 like concat('%', k.keyword, '%')


来源:https://stackoverflow.com/questions/62077780/hive-query-select-a-column-based-on-the-condition-another-columns-values-match

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!