Query A Nested Array in Parquet Records

依然范特西╮ 提交于 2019-12-01 13:21:47

It seems that you can use

org.apache.spark.sql.functions.explode(e: Column): Column

for example in my project(in java), i have nested json like this:

{
    "error": [],
    "trajet": [
        {
            "something": "value"
        }
    ],
    "infos": [
        {
            "something": "value"
        }
    ],
    "timeseries": [
        {
            "something_0": "value_0",
            "something_1": "value_1",
            ...
            "something_n": "value_n"
        }
    ]
}

and i wanted to analyse datas in "timeseries", so i did:

DataFrame ts = jsonDF.select(org.apache.spark.sql.functions.explode(jsonDF.col("timeseries")).as("t"))
                     .select("t.something_0",
                             "t.something_1",
                             ...
                             "t.something_n");

I'm new to spark too. Hope this could give you a hint.

The problem was solved by

I found a way through Explode.

val results = sqlc.sql("SELECT firstname, child.name, FROM parent LATERAL VIEW explode(children) childTable AS child 
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!