Convert an RDD to iterable: PySpark?

百般思念 提交于 2019-12-30 17:26:12

问题


I have an RDD which I am creating by loading a text file and preprocessing it. I dont want to collect it and save it to the disk or memory(entire data) but rather want to pass it to some other function in python which consumes data one after the other is form of iterable.

How is this possible?

data =  sc.textFile('file.txt').map(lambda x: some_func(x))

an_iterable = data. ##  what should I do here to make it give me one element at a time?
def model1(an_iterable):
 for i in an_iterable:
  do_that(i)

model(an_iterable)

回答1:


I believe what you want is toLocalIterator():




回答2:


data =  sc.textFile('file.txt').map(lambda x: some_func(x))
# you need to call RDD method() then loop
for i in data.collect():
  print i


来源:https://stackoverflow.com/questions/32771737/convert-an-rdd-to-iterable-pyspark

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!