问题
I'm Looking for a way to split the column based on comma delimited data. Below is my dataset
id col1 col2
1 5,6 7,8
I want to get the result
id col1 col2
1 5 7
1 6 8
The position of the index should match because I need to fetch results accordingly.
I tried the below query but it returns the cartesian product.
Query:
SELECT col3, col4
FROM test ext
lateral VIEW explode(split(col1,'\002')) col1 AS col3
lateral VIEW explode(split(col2,'\002')) col2 AS col4
Result:
id col1 col2
1 5 7
1 5 8
1 6 7
1 6 8
回答1:
You can use posexplode() to create position index columns for your split arrays. Then, select only those rows where the position indices are equal.
SELECT id, col3, col4
FROM test
lateral VIEW posexplode(split(col1,'\002')) col1 AS pos3, col3
lateral VIEW posexplode(split(col2,'\002')) col2 AS pos4, col4
WHERE pos3 = pos4;
Output:
id col3 col4
1 5 7
1 6 8
Reference: Hive language manual - posexplode()
来源:https://stackoverflow.com/questions/37585638/hive-split-delimited-columns-over-multiple-rows-select-based-on-position