Pyspark: Parse a column of json strings

前端 未结 4 1353
忘掉有多难
忘掉有多难 2020-11-27 15:25

I have a pyspark dataframe consisting of one column, called json, where each row is a unicode string of json. I\'d like to parse each row and return a new dataf

4条回答
  •  刺人心
    刺人心 (楼主)
    2020-11-27 15:46

    Here's a concise (spark SQL) version of @nolan-conaway's parseJSONCols function.

    SELECT 
    explode(
        from_json(
            concat('{"data":', 
                   '[{"a": 1.0,"b": 1},{"a": 0.0,"b": 2}]', 
                   '}'), 
            'data array>'
        ).data) as data;
    

    PS. I've added the explode function as well :P

    You'll need to know some HIVE SQL types

提交回复
热议问题