How can I apply aggregate functions to data extracted from JSON in Google BigQuery?

梦想的初衷 提交于 2019-12-10 13:47:25

问题


I am extracting JSON data out of a BigQuery column using JSON_EXTRACT. Now I want to extract lists of values and run aggregate functions (like AVG) against them. Testing the JsonPath expression .objects[*].v succeeds on http://jsonpath.curiousconcept.com/. But the query:

SELECT
  JSON_EXTRACT(json_column, "$.id") as id,
  AVG(JSON_EXTRACT(json_column, "$.objects[*].v")) as average_value
FROM [tablename]

throws a JsonPath parse error on BigQuery. Is this possible on BigQuery? Or do I need to preprocess my data in order to run aggregate functions against data inside of my JSON?

My data looks similar to this:

# Record 1
{
  "id": "abc",
  "objects": [
    {
      "id": 1,
      "v": 1
    },
    {
      "id": 2,
      "v": 3
    }
  ]
}
# Record 2
{
  "id": "def",
  "objects": [
    {
      "id": 1,
      "v": 2
    },
    {
      "id": 2,
      "v": 5
    }
  ]
}

This is related to another question.

Update: The problem can be simplified by running two queries. First, run JSON_EXTRACT and save the results into a view. Secondly, run the aggregate function against this view. But even then I need to correct the JsonPath expression $.objects[*].v to prevent the JSONPath parse error.


回答1:


Leverage SPLIT() to pivot repeatable fields into separate rows. Also might be easier/cleaner to put this into a subquery and put AVG outside:

SELECT id, AVG(v) as average 
FROM (
SELECT 
    JSON_EXTRACT(json_column, "$.id") as id, 
    INTEGER( 
      REGEXP_EXTRACT(
        SPLIT(
          JSON_EXTRACT(json_column, "$.objects")
          ,"},{"
          )
        ,r'\"v\"\:([^,]+),')) as v FROM [mytable] 
)
GROUP BY id;


来源:https://stackoverflow.com/questions/26616094/how-can-i-apply-aggregate-functions-to-data-extracted-from-json-in-google-bigque

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!