I\'ve just created python list of range(1,100000).
range(1,100000)
Using SparkContext done the following steps:
a = sc.parallelize([i for i in range(1,
Spark natively ships a copy of each variable over during the shipping of the task. For large sizes of such variables you may want to use Broadcast Variables
If you are still facing size problems, Then perhaps this data should be an RDD in itself
edit: Updated the link