how to cross join unnest a json array in presto

匿名 (未验证) 提交于 2019-12-03 02:26:02

问题:

Given a table that contains a column of JSON like this:

{"payload":[{"type":"b","value":"9"}, {"type":"a","value":"8"}]} {"payload":[{"type":"c","value":"7"}, {"type":"b","value":"3"}]}

How can I write a Presto query to give me the average b value across all entries?

So far I think I need to use something like Hive's lateral view explode, whose equivalent is cross join unnest in Presto.

But I'm stuck on how to write the Presto query for cross join unnest.

How can I use cross join unnest to expand all array elements and select them?

回答1:

As you pointed out, this was finally implemented in Presto 0.79. :)

Here is an example of the syntax for the cast from here:

select cast(cast ('[1,2,3]' as json) as array<bigint>); 

Special word of advice, there is no 'string' type in Presto like there is in Hive. That means if your array contains strings make sure you use type 'varchar' otherwise you get an error msg saying 'type array does not exist' which can be misleading.

select cast(cast ('["1","2","3"]' as json) as array<varchar>); 


回答2:

The problem was that I was running an old version of Presto.

unnest was added in version 0.79

https://github.com/facebook/presto/blob/50081273a9e8c4d7b9d851425211c71bfaf8a34e/presto-docs/src/main/sphinx/release/release-0.79.rst



回答3:

Here's an example of that

with example(message) as ( VALUES (json '{"payload":[{"type":"b","value":"9"},{"type":"a","value":"8"}]}'), (json '{"payload":[{"type":"c","value":"7"}, {"type":"b","value":"3"}]}') )   SELECT         n.type,         avg(n.value) FROM example CROSS JOIN     UNNEST(             CAST(                 JSON_EXTRACT(message,'$.payload')                     as ARRAY(ROW(type VARCHAR, value INTEGER))                     )                 ) as x(n) WHERE n.type = 'b' GROUP BY n.type 

with defines a common table expression (CTE) names example with a column aliased as message

VALUES returns a verbatim table rowset

UNNEST is taking an array within a column of a single row and returning the elements of the array as multiple rows.

CAST is changing the JSON type into an ARRAY type that is required for UNNEST. It could easily have been an ARRAY<MAP< but I find ARRAY(ROW( nicer as you can specify column names, and use dot notation in the select clause.

JSON_EXTRACT is using a jsonPath expression to return the array value of the payload key

avg() and group by should be familiar SQL.



标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!