Flatten Nested Spark Dataframe

前端 未结 4 1800
-上瘾入骨i
-上瘾入骨i 2020-12-03 08:28

Is there a way to flatten an arbitrarily nested Spark Dataframe? Most of the work I\'m seeing is written for specific schema, and I\'d like to be able to generically flatten

4条回答
  •  庸人自扰
    2020-12-03 09:19

    Here's my final approach:

    1) Map the rows in the dataframe to an rdd of dict. Find suitable python code online for flattening dict.

    flat_rdd = nested_df.map(lambda x : flatten(x))
    

    where

    def flatten(x):
      x_dict = x.asDict()
      ...some flattening code...
      return x_dict
    

    2) Convert the RDD[dict] back to a dataframe

    flat_df = sqlContext.createDataFrame(flat_rdd)
    

提交回复
热议问题