问题
I have some records where every row belongs to some categories (data type - array of string) and a separate list of unique category(data type - string). I need to match every row with unique list and create flags for it.
Input:
------
ID Category
1 ["Physics","Math"]
2 ["Math"]
3 ["Math,"Chemistry"]
4 ["Physics","Computer"]
Now I have separate list of unique list of category in excel in local like below:
Unique Category
["Physics"]
["Math"]
["Chemistry"]
["Computer"]
Final Output should look like this:
ID Category Math_F Physics_F Computer_F Chemistry_F
1 ["Physics","Math"] 1 1 0 0
2 ["Math"] 1 0 0 0
3 ["Math,"Chemistry"] 1 0 0 1
4 ["Physics","Computer"] 0 1 1 0
Can someone please help with query, steps and explanation. I am new to Hive.
回答1:
Using array_contains():
SELECT ID,
Category,
CASE
WHEN array_contains(Category, 'Math') THEN 1
ELSE 0
END Math_F,
CASE
WHEN array_contains(Category, 'Physics') THEN 1
ELSE 0
END Physics_F,
CASE
WHEN array_contains(Category, 'Computer') THEN 1
ELSE 0
END Computer_F,
CASE
WHEN array_contains(Category, 'Chemistry') THEN 1
ELSE 0
END Chemistry_F
FROM TABLE t;
And if you want columns to be constructed dynamically using your array of unique categories, then use some other tool for building query. For example it can be done using shell script.
See this example of building SQL based on pre-defined array. You can easily add your array reading from the file:
#!/bin/bash
#define array
array=( Physics Math Computer Chemistry )
#initial sql
sql="select ID,
Category,"
#get length of array
arraylength=${#array[@]}
#get first flag column
columns="CASE
WHEN array_contains(Category,'${array[0]}') THEN 1
ELSE 0
END ${array[0]}_F"
#attach all other flags:
for (( i=1; i<=$(( $arraylength-1 )); i++ ))
do
columns="$columns,
CASE
WHEN array_contains(Category,'${array[$i]}') THEN 1
ELSE 0
END ${array[$i]}_F"
done
#final SQL
sql="$sql
$columns
from table t;
"
#print result
echo "$sql"
Result:
SELECT ID,
Category,
CASE
WHEN array_contains(Category, 'Physics') THEN 1
ELSE 0
END Physics_F,
CASE
WHEN array_contains(Category, 'Math') THEN 1
ELSE 0
END Math_F,
CASE
WHEN array_contains(Category, 'Computer') THEN 1
ELSE 0
END Computer_F,
CASE
WHEN array_contains(Category, 'Chemistry') THEN 1
ELSE 0
END Chemistry_F
FROM TABLE t;
You can add Hive call to the above script: hive -e "$sql" to execute it, or save it to the file instead.
来源:https://stackoverflow.com/questions/52239349/hive-query-matching-column-values-from-array-of-string-to-make-flags