Transform nested dictionary key values to pyspark dataframe

自作多情 提交于 2020-08-26 03:12:21

问题


I have a Pyspark dataframe that looks like this:

I would like extract those nested dictionaries in the "dic" column and transform them into PySpark dataframe. Like this:

Please let me know how I can achieve this.

Thanks!


回答1:


from pyspark.sql import functions as F

df.show() #sample dataframe

+---------+----------------------------------------------------------------------------------------------------------+
|timestmap|dic                                                                                                       |
+---------+----------------------------------------------------------------------------------------------------------+
|timestamp|{"Name":"David","Age":"25","Location":"New York","Height":"170","fields":{"Color":"Blue","Shape":"round"}}|
+---------+----------------------------------------------------------------------------------------------------------+

For Spark2.4+, you could use from_json and schema_of_json.

schema=df.select(F.schema_of_json(df.select("dic").first()[0])).first()[0]


df.withColumn("dic", F.from_json("dic", schema))\
  .selectExpr("dic.*").selectExpr("*","fields.*").drop("fields").show()

#+---+------+--------+-----+-----+-----+
#|Age|Height|Location| Name|Color|Shape|
#+---+------+--------+-----+-----+-----+
#| 25|   170|New York|David| Blue|round|
#+---+------+--------+-----+-----+-----+

You could also use rdd way with read.json if you don't have spark2.4. There will be performance hit of df to rdd conversion.

df1 = spark.read.json(df.rdd.map(lambda r: r.dic))\
   
df1.select(*[x for x in df1.columns if x!='fields'], F.col("fields.*")).show()

#+---+------+--------+-----+-----+-----+
#|Age|Height|Location| Name|Color|Shape|
#+---+------+--------+-----+-----+-----+
#| 25|   170|New York|David| Blue|round|
#+---+------+--------+-----+-----+-----+


来源:https://stackoverflow.com/questions/63004575/transform-nested-dictionary-key-values-to-pyspark-dataframe

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!