问题
I have the following table in this format
create table raw_data (
userid BIGINT,
city VARCHAR,
campaign ARRAY <
STRUCT <campaignid BIGINT,
campaign_start_at TIMESTAMP,
campaign_ends_at TIMESTAMP,
parameters ARRAY<
STRUCT < goal VARCHAR,
reward VARCHAR
>
campaignstatus ARRAY
STRUCT < seen BOOLEAN ,
seen_at TIMESTAMP
action VARCHAR,
action_at TIMESTAMP
>
>
>)
I want the final result to be like this:
userid|city|campaignid|campaign_start_at|campaign_ends_at|goal|reward|seen|seen_at|action|action_at
1 | Athens | 234 | 2019-03-19 12:00 |2019-03-19 14:00| 10| 2.7 | yes |2019-03-19 10:23|null|null
1 | Athens | 234 | 2019-03-19 12:00 |2019-03-19 14:00| 10| 2.7 | yes |2019-03-17 10:23|participate|2019-03-19 11:20
2 | Athens | 234 | 2019-03-19 12:00 |2019-03-19 14:00| 10| 2.7 | yes |2019-03-19 10:23|ignore|2019-03-19 10:10
3 | Athens | 234 | 2019-03-19 12:00 |2019-03-19 14:00| 10| 2.7 | null|null|null|null
3 | Athens | 234 | 2019-03-19 12:00 |2019-03-19 14:00| 10| 2.7 | yes |2019-03-19 12:23|blocked|2019-03-19 12:24
In other words, I want to unnest the data and find info on userid level. I have tried to unnest the table using the following script
select *
FROM raw_data
LEFT JOIN UNNEST(campaign) as t(campaigns)
but it returns error: Table hive.default.campaign does not exist
My questions are:
Is it possible to unnest multiple arrays in parallel in presto?
- If yes, how do i do that?
- If not, what order should i follow to unnest the columns in the higher level (userid) eg: inside-out or vice versa? An example would be much appreciated.
回答1:
So basically I found a solution, rather simple but it works.
In order to unnest all the nested arrays you need to work from the outter array towards to the inner array. For this example
- first unnest the
campaign
array based on userid - secondly unnest the
campaignstatus
array base on userid and campaignid - thirdly unnest the
parameters
array. Important note:parameters
array may be manipulated as an object (not array) as all the data are strings and can be accessed with json functions.
More specifically, the query will be like this:
select
a.userid
,a.city
,a.campaignid
,a.campaign_start_at
,a.campaign_ends_at TIMESTAMP
,cs.sseen
,cs.seen_at
,cs.action
,cs.action_at
,json_array_get(cast(parameters as json),0) goal
,json_array_get(cast(parameters as json),1) reward
from (
select
userid
,city
,campaignid
,campaign_start_at
,campaign_ends_at TIMESTAMP
from raw_data
cross join unnest(campaign) as c
) a
cross join unnest(campaignstatus) as cs
However, I would love to read more sophisticated solutions.
来源:https://stackoverflow.com/questions/59567759/how-to-perform-multiple-array-unnest-in-parallel-in-presto