问题
I have to some query and creat columns operations in HiveQL.
For example,
app col1
app1 anybody love me?
app2 I hate u
app3 this hat is good
app4 I don't like this one
app5 oh my god
app6 damn you.
app7 such nice girl
app8 xxxxx
app9 pretty prefect
app10 don't love me.
app11 xxx anybody?
I want to match a keyword list like ['anybody', 'love', 'you', 'xxx', 'don't']
and select the matched keyword result as a new column, named keyword
as follows:
app keyword
app1 anybody, love
app4 I don't like this one
app6 damn you.
app8 xxx
app10 don't, love
app11 xxx
It seems that I have to use nested query.
The logic is kind of like selecting the matched result rows and setting a matched results which should be saved in a list or something like this as a new column.
But I am not familiar enough with the HiveQL.
Could anyone help me?
Thanks in advances.
回答1:
You could turn the list of words to a table and join it with your table using pattern matching:
select t.app, k.keyword
from mytable t
inner join (values ('anybody'), ('you'), ('xxx'), ('don''t')) as k(keyword)
on t.col1 like conca('%', k.keyword, '%')
Note that this will duplicate app
if more than one keyword matches on a phrase. You did not specify how you want to handle this use case.
In hive, you can also phrase this as:
select t.app, k.keyword
from mytable t
inner join table(values 'anybody', 'you', 'xxx', 'don''t') as k(keyword)
on t.col1 like conca('%', k.keyword, '%')
回答2:
In Hive you can use stack
UDTF:
with keywords as (
select stack(4, --the number of tuples
'anybody', 'you', 'xxx', 'don\'t'
) as keyword
)
select t.app, k.keyword
from mytable t
inner join keywords k
on t.col1 like concat('%', k.keyword, '%')
Also for older versions of hive join using like
will not work, use cross join with stack and filter in the WHERE:
from mytable t
cross join keywords k
where t.col1 like concat('%', k.keyword, '%')
来源:https://stackoverflow.com/questions/62077780/hive-query-select-a-column-based-on-the-condition-another-columns-values-match