PySpark Evaluation

后端 未结 2 1532
自闭症患者
自闭症患者 2020-12-03 16:18

I am trying the following code which adds a number to every row in an RDD and returns a list of RDDs using PySpark.

from pyspark.context import SparkContext         


        
2条回答
  •  隐瞒了意图╮
    2020-12-03 16:31

    This is due to to the fact that lambdas refer to the i via reference! It has nothing to do with spark. See this

    You can try this:

    a =[(lambda y: (lambda x: y + int(x)))(i) for i in range(4)]
    splits = [data.map(a[x]) for x in range(4)]
    

    or in one line

    splits = [
        data.map([(lambda y: (lambda x: y + int(x)))(i) for i in range(4)][x])
        for x in range(4)
    ]
    

提交回复
热议问题