Flatten Nested Spark Dataframe

前端未结

关注

 4  1800

-上瘾入骨i 2020-12-03 08:28

Is there a way to flatten an arbitrarily nested Spark Dataframe? Most of the work I\'m seeing is written for specific schema, and I\'d like to be able to generically flatten

4条回答

庸人自扰 (楼主)

2020-12-03 09:19
Here's my final approach:

1) Map the rows in the dataframe to an rdd of dict. Find suitable python code online for flattening dict.
```
flat_rdd = nested_df.map(lambda x : flatten(x))
```
where
```
def flatten(x):
  x_dict = x.asDict()
  ...some flattening code...
  return x_dict
```
2) Convert the RDD[dict] back to a dataframe
```
flat_df = sqlContext.createDataFrame(flat_rdd)
```
0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...