Is it possible to create nested RDDs in Apache Spark?

前端 未结 2 894
甜味超标
甜味超标 2020-12-06 13:28

I am trying to implement K-nearest neighbor algorithm in Spark. I was wondering if it is possible to work with nested RDD\'s. This will make my life a lot easier. Consider t

2条回答
  •  挽巷
    挽巷 (楼主)
    2020-12-06 14:16

    No, it is not possible, because the items of an RDD must be serializable and a RDD is not serializable. And this makes sense, otherwise you might transfer over the network a whole RDD which is a problem if it contains a lot of data. And if it does not contain a lot of data, you might and you should use an array or something like it.

    However, I don't know how you are implementing the K-nearest neighbor...but be careful: if you do something like calculating the distance between each couple of point, this is actually not scalable in the dataset size, because it's O(n2).

提交回复
热议问题