How to flatten nested lists in PySpark?

半腔热情 提交于 2019-12-01 00:25:33

问题


I have an RDD structure like:

rdd = [[[1],[2],[3]], [[4],[5]], [[6]], [[7],[8],[9],[10]]]

and I want it to become:

rdd = [1,2,3,4,5,6,7,8,9,10]

How do I write a map or reduce function to make it work?


回答1:


You can for example flatMap and use list comprehensions:

rdd.flatMap(lambda xs: [x[0] for x in xs])

or to make it a little bit more general:

from itertools import chain

rdd.flatMap(lambda xs: chain(*xs)).collect()


来源:https://stackoverflow.com/questions/34711149/how-to-flatten-nested-lists-in-pyspark

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!