pyspark: 'PipelinedRDD' object is not iterable

后端未结

关注

 2  2221

难免孤独 2020-12-18 07:56

I am getting this error but i do not know why. Basically I am erroring from this code:

    a = data.mapPartitions(helper(locations))

wher

2条回答

再見小時候 (楼主)

2020-12-18 08:11

RDD can iterated by using map and lambda functions. I have iterated through Pipelined RDD using the below method

lines1 = sc.textFile("\..\file1.csv")
lines2 = sc.textFile("\..\file2.csv")

pairs1 = lines1.map(lambda s: (int(s), 'file1'))
pairs2 = lines2.map(lambda s: (int(s), 'file2'))

pair_result = pairs1.union(pairs2)

pair_result.reduceByKey(lambda a, b: a + ','+ b)

result = pair.map(lambda l: tuple(l[:1]) + tuple(l[1].split(',')))
result_ll = [list(elem) for elem in result]

===> result_ll = [list(elem) for elem in result]

TypeError: 'PipelinedRDD' object is not iterable

Instead of this I replaced the iteration using map function

result_ll = result.map( lambda elem: list(elem))

Hope this helps to modify your code accordingly

0 讨论(0)

查看其它2个回答