How to perform multiple array unnest() in parallel in Presto

与世无争的帅哥 提交于 2020-03-26 05:06:08

问题


I have the following table in this format

create table raw_data (

userid BIGINT,
city  VARCHAR,
campaign ARRAY <
       STRUCT <campaignid BIGINT,
               campaign_start_at TIMESTAMP,
               campaign_ends_at TIMESTAMP,
               parameters ARRAY<
                           STRUCT < goal VARCHAR,
                                    reward VARCHAR
                                  >
               campaignstatus ARRAY
                          STRUCT < seen BOOLEAN ,
                                   seen_at TIMESTAMP
                                   action VARCHAR,
                                   action_at TIMESTAMP
                                  >
                                >
                 >)

I want the final result to be like this:

userid|city|campaignid|campaign_start_at|campaign_ends_at|goal|reward|seen|seen_at|action|action_at

1 | Athens | 234   | 2019-03-19 12:00 |2019-03-19 14:00| 10| 2.7 | yes |2019-03-19 10:23|null|null
1 | Athens | 234   | 2019-03-19 12:00 |2019-03-19 14:00| 10| 2.7 | yes |2019-03-17 10:23|participate|2019-03-19 11:20
2 | Athens | 234   | 2019-03-19 12:00 |2019-03-19 14:00| 10| 2.7 | yes |2019-03-19 10:23|ignore|2019-03-19 10:10
3 | Athens | 234   | 2019-03-19 12:00 |2019-03-19 14:00| 10| 2.7 | null|null|null|null
3 | Athens | 234   | 2019-03-19 12:00 |2019-03-19 14:00| 10| 2.7 | yes |2019-03-19 12:23|blocked|2019-03-19 12:24

In other words, I want to unnest the data and find info on userid level. I have tried to unnest the table using the following script

select * 
FROM raw_data 
LEFT JOIN UNNEST(campaign) as t(campaigns)

but it returns error: Table hive.default.campaign does not exist

My questions are:

Is it possible to unnest multiple arrays in parallel in presto?

  • If yes, how do i do that?
  • If not, what order should i follow to unnest the columns in the higher level (userid) eg: inside-out or vice versa? An example would be much appreciated.

回答1:


So basically I found a solution, rather simple but it works.

In order to unnest all the nested arrays you need to work from the outter array towards to the inner array. For this example

  • first unnest the campaign array based on userid
  • secondly unnest the campaignstatus array base on userid and campaignid
  • thirdly unnest the parameters array. Important note: parameters array may be manipulated as an object (not array) as all the data are strings and can be accessed with json functions.

More specifically, the query will be like this:

select 
   a.userid
   ,a.city
   ,a.campaignid 
   ,a.campaign_start_at 
   ,a.campaign_ends_at TIMESTAMP
   ,cs.sseen
   ,cs.seen_at
   ,cs.action
   ,cs.action_at
   ,json_array_get(cast(parameters as json),0) goal
   ,json_array_get(cast(parameters as json),1) reward

from (
       select
       userid
      ,city
      ,campaignid 
      ,campaign_start_at 
      ,campaign_ends_at TIMESTAMP

      from raw_data
      cross join unnest(campaign) as c
   ) a

cross join unnest(campaignstatus) as cs

However, I would love to read more sophisticated solutions.



来源:https://stackoverflow.com/questions/59567759/how-to-perform-multiple-array-unnest-in-parallel-in-presto

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!