I\'ve just created python list of range(1,100000)
.
Using SparkContext done the following steps:
a = sc.parallelize([i for i in range(1,
Expanding @leo9r comment: consider using not a python range
, but sc.range
https://spark.apache.org/docs/1.6.0/api/python/pyspark.html#pyspark.SparkContext.range.
Thus you avoid transfer of huge list from your driver to executors.
Of course, such RDDs are usually used for testing purposes only, so you do not want them to be broadcasted.