Flatten nested JSON string to different columns in Google BigQuery

萝らか妹 提交于 2020-01-15 12:27:14

问题


I have column in one of the BigQuery table which looks like this.

{"name": "name1", "last_delivered": {"push_id": "push_id1", "time": "time1"}, "session_id": "session_id1", "source": "SDK", "properties": {"UserId": "u1"}}

Is there any was to get the output like this in GBQ ?? (basically flatten the entire column into different columns)

name    last_delivered.push_id   last_delivered.time   session_id   source   properties.UserId

name1       push_id1                     time1         session_id1   SDK          uid1

Let's say

a = {"name": "name1", "last_delivered": {"push_id": "push_id1", "time": "time1"}, "session_id": "session_id1", "source": "SDK", "properties": {"UserId": "u1"}}

I have tried to get desired output in Pandas Python using json_normalize(a) , but every time I try get the following error

Does anyone has any idea how can I get my desired output. Am I missing something ??

Any help would be greatly appreciated!!


回答1:


My guess as to why it's not working is that your json data is actually a string:

from pandas.io.json import json_normalize 

a = '''{"name": "name1", "last_delivered": {"push_id": "push_id1", "time": "time1"}, "session_id": "session_id1", "source": "SDK", "properties": {"UserId": "u1"}}'''  

df = json_normalize(a)

Output:

AttributeError: 'str' object has no attribute 'values'    

Versus:

from pandas.io.json import json_normalize 

a = {"name": "name1", "last_delivered": {"push_id": "push_id1", "time": "time1"}, "session_id": "session_id1", "source": "SDK", "properties": {"UserId": "u1"}}  

df = json_normalize(a)

Output:

 print(df.to_string())
  last_delivered.push_id last_delivered.time   name properties.UserId   session_id source
0               push_id1               time1  name1                u1  session_id1    SDK

If this is the case, you can use json.loads() right before normalize:

import json
from pandas.io.json import json_normalize

a = '''{"name": "name1", "last_delivered": {"push_id": "push_id1", "time": "time1"}, "session_id": "session_id1", "source": "SDK", "properties": {"UserId": "u1"}}'''  

data = json.loads(a)
df = json_normalize(data)



回答2:


Below example is for BigQuery Standard SQL

#standardSQL
WITH `project.dataset.table` AS (
  SELECT '{"name": "name1", "last_delivered": {"push_id": "push_id1", "time": "time1"}, "session_id": "session_id1", "source": "SDK", "properties": {"UserId": "u1"}}' col
)
SELECT 
  JSON_EXTRACT_SCALAR(col, '$.name') name,
  STRUCT(
    JSON_EXTRACT_SCALAR(col, '$.last_delivered.push_id') AS push_id,
    JSON_EXTRACT_SCALAR(col, '$.last_delivered.time') AS time
  ) last_delivered,
  JSON_EXTRACT_SCALAR(col, '$.session_id') session_id,
  JSON_EXTRACT_SCALAR(col, '$.source') source,
  STRUCT(
    JSON_EXTRACT_SCALAR(col, '$.properties.UserId') AS UserId
  ) properties
FROM `project.dataset.table`   

and produces result as expected/asked

Row name    last_delivered.push_id  last_delivered.time session_id  source  properties.UserId    
1   name1   push_id1                time1               session_id1 SDK     u1     


来源:https://stackoverflow.com/questions/55244750/flatten-nested-json-string-to-different-columns-in-google-bigquery

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!