Is there a way to flatten an arbitrarily nested Spark Dataframe? Most of the work I\'m seeing is written for specific schema, and I\'d like to be able to generically flatten
Here's my final approach:
1) Map the rows in the dataframe to an rdd of dict. Find suitable python code online for flattening dict.
flat_rdd = nested_df.map(lambda x : flatten(x))
where
def flatten(x):
x_dict = x.asDict()
...some flattening code...
return x_dict
2) Convert the RDD[dict] back to a dataframe
flat_df = sqlContext.createDataFrame(flat_rdd)