I am trying the following code which adds a number to every row in an RDD and returns a list of RDDs using PySpark.
from pyspark.context import SparkContext
This is due to to the fact that lambdas refer to the i via reference! It has nothing to do with spark. See this
You can try this:
a =[(lambda y: (lambda x: y + int(x)))(i) for i in range(4)]
splits = [data.map(a[x]) for x in range(4)]
or in one line
splits = [
data.map([(lambda y: (lambda x: y + int(x)))(i) for i in range(4)][x])
for x in range(4)
]