问题
I have the following table in this format
create table raw_data (
userid BIGINT,
city VARCHAR,
campaign ARRAY <
STRUCT <campaignid BIGINT,
campaign_start_at TIMESTAMP,
campaign_ends_at TIMESTAMP,
parameters ARRAY<
STRUCT < goal VARCHAR,
reward VARCHAR
>
campaignstatus ARRAY
STRUCT < seen BOOLEAN ,
seen_at TIMESTAMP
action VARCHAR,
action_at TIMESTAMP
>
>
>)
I want the final result to be like this:
userid|city|campaignid|campaign_start_at|campaign_ends_at|goal|reward|seen|seen_at|action|action_at
1 | Athens | 234 | 2019-03-19 12:00 |2019-03-19 14:00| 10| 2.7 | yes |2019-03-19 10:23|null|null
1 | Athens | 234 | 2019-03-19 12:00 |2019-03-19 14:00| 10| 2.7 | yes |2019-03-17 10:23|participate|2019-03-19 11:20
2 | Athens | 234 | 2019-03-19 12:00 |2019-03-19 14:00| 10| 2.7 | yes |2019-03-19 10:23|ignore|2019-03-19 10:10
3 | Athens | 234 | 2019-03-19 12:00 |2019-03-19 14:00| 10| 2.7 | null|null|null|null
3 | Athens | 234 | 2019-03-19 12:00 |2019-03-19 14:00| 10| 2.7 | yes |2019-03-19 12:23|blocked|2019-03-19 12:24
In other words, I want to unnest the data and find info on userid level. I have tried to unnest the table using the following script
select *
FROM raw_data
LEFT JOIN UNNEST(campaign) as t(campaigns)
but it returns error: Table hive.default.campaign does not exist
My questions are:
Is it possible to unnest multiple arrays in parallel in presto?
- If yes, how do i do that?
- If not, what order should i follow to unnest the columns in the higher level (userid) eg: inside-out or vice versa? An example would be much appreciated.
回答1:
So basically I found a solution, rather simple but it works.
In order to unnest all the nested arrays you need to work from the outter array towards to the inner array. For this example
- first unnest the
campaignarray based on userid - secondly unnest the
campaignstatusarray base on userid and campaignid - thirdly unnest the
parametersarray. Important note:parametersarray may be manipulated as an object (not array) as all the data are strings and can be accessed with json functions.
More specifically, the query will be like this:
select
a.userid
,a.city
,a.campaignid
,a.campaign_start_at
,a.campaign_ends_at TIMESTAMP
,cs.sseen
,cs.seen_at
,cs.action
,cs.action_at
,json_array_get(cast(parameters as json),0) goal
,json_array_get(cast(parameters as json),1) reward
from (
select
userid
,city
,campaignid
,campaign_start_at
,campaign_ends_at TIMESTAMP
from raw_data
cross join unnest(campaign) as c
) a
cross join unnest(campaignstatus) as cs
However, I would love to read more sophisticated solutions.
来源:https://stackoverflow.com/questions/59567759/how-to-perform-multiple-array-unnest-in-parallel-in-presto