Is it possible to create nested RDDs in Apache Spark?

前端未结

关注

 2  894

甜味超标 2020-12-06 13:28

I am trying to implement K-nearest neighbor algorithm in Spark. I was wondering if it is possible to work with nested RDD\'s. This will make my life a lot easier. Consider t

2条回答

挽巷 (楼主)

2020-12-06 14:16

No, it is not possible, because the items of an RDD must be serializable and a RDD is not serializable. And this makes sense, otherwise you might transfer over the network a whole RDD which is a problem if it contains a lot of data. And if it does not contain a lot of data, you might and you should use an array or something like it.

However, I don't know how you are implementing the K-nearest neighbor...but be careful: if you do something like calculating the distance between each couple of point, this is actually not scalable in the dataset size, because it's O(n2).

0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...