Dealing with non-uniform JSON columns in spark dataframe

后端未结

关注

 2  947

闹比i 2021-01-14 07:48

I would like to know what is the best practice for reading a newline delimited JSON file into a dataframe. Critically, one of the (required) fields in each record maps to an

2条回答

猫巷女王i (楼主)

2021-01-14 08:38
I recommend looking into Rumble to query, on Spark, heterogeneous JSON datasets that do not fit in DataFrames. This is precisely the problem it solves. It is free and open-source.

For example:
```
for $i in json-file("s3://bucket/path/to/newline_separated_json.txt")
where keys($i.data) = "key2" (: keeping only those objects that have a key2 :)
group by $type := $i.type
return {
  "type" : $type,
  "key2-values" : [ $i.data.key2 ]
}
```
(Disclaimer: I am part of the team.)
0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...