可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
Given a table that contains a column of JSON like this:
{"payload":[{"type":"b","value":"9"}, {"type":"a","value":"8"}]} {"payload":[{"type":"c","value":"7"}, {"type":"b","value":"3"}]}
How can I write a Presto query to give me the average b
value across all entries?
So far I think I need to use something like Hive's lateral view explode, whose equivalent is cross join unnest in Presto.
But I'm stuck on how to write the Presto query for cross join unnest
.
How can I use cross join unnest
to expand all array elements and select them?
回答1:
As you pointed out, this was finally implemented in Presto 0.79. :)
Here is an example of the syntax for the cast from here:
select cast(cast ('[1,2,3]' as json) as array<bigint>);
Special word of advice, there is no 'string' type in Presto like there is in Hive. That means if your array contains strings make sure you use type 'varchar' otherwise you get an error msg saying 'type array does not exist' which can be misleading.
select cast(cast ('["1","2","3"]' as json) as array<varchar>);
回答2:
回答3:
Here's an example of that
with example(message) as ( VALUES (json '{"payload":[{"type":"b","value":"9"},{"type":"a","value":"8"}]}'), (json '{"payload":[{"type":"c","value":"7"}, {"type":"b","value":"3"}]}') ) SELECT n.type, avg(n.value) FROM example CROSS JOIN UNNEST( CAST( JSON_EXTRACT(message,'$.payload') as ARRAY(ROW(type VARCHAR, value INTEGER)) ) ) as x(n) WHERE n.type = 'b' GROUP BY n.type
with
defines a common table expression (CTE) names example
with a column aliased as message
VALUES
returns a verbatim table rowset
UNNEST
is taking an array within a column of a single row and returning the elements of the array as multiple rows.
CAST
is changing the JSON
type into an ARRAY
type that is required for UNNEST
. It could easily have been an ARRAY<MAP<
but I find ARRAY(ROW(
nicer as you can specify column names, and use dot notation in the select clause.
JSON_EXTRACT
is using a jsonPath expression to return the array value of the payload
key
avg()
and group by
should be familiar SQL.