My spark application is using RDD\'s of numpy arrays.
At the moment, I\'m reading my data from AWS S3, and its represented as
a simple text file where each line is a ve
The best thing to do under these circumstances is to use pandas library for io.
Please refer to this question : pandas read_csv() and python iterator as input
.
There you will see how to replace the np.loadtxt() function so it would be much faster to create a RDD of numpy array.