Spark fastest way for creating RDD of numpy arrays

后端未结

关注

 3  1817

长情又很酷 2020-12-18 11:28

My spark application is using RDD\'s of numpy arrays.
At the moment, I\'m reading my data from AWS S3, and its represented as a simple text file where each line is a ve

3条回答

别那么骄傲 (楼主)

2020-12-18 12:01

The best thing to do under these circumstances is to use pandas library for io.
Please refer to this question : pandas read_csv() and python iterator as input .
There you will see how to replace the np.loadtxt() function so it would be much faster to
create a RDD of numpy array.

0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...