Explicit sort in Cartesian transformation in Scala Spark

你。 提交于 2019-12-02 01:49:53

If what you need is to be able to identify each point (so you can determine the pair of points and their L2 distance), thus what you really require is to add an id to each entry in the RDD or DataFrame.

If you want to use an RDD, the approach I recommend is:

myRDD = sc.parallelize([(0, (0.0, 0.0)), (1, (2.0, 0.0)), 
                        (2, (-3.0, 2.0)), (3, (-6.0, -4.0))])

combinations = myRDD.cartesian(myRDD).coalesce(32)

distances = combinations\
    .filter(lambda (x, y): x[0] < y[0])\
    .map(lambda ((id1, (x1, y1)), (id2, (x2, y2))): (id1, id2, ((x1 - x2) ** 2 + (y1 - y2) ** 2) ** 0.5))

distances.collect()

Have you tried the sorted function? Seems to sort Tuples by its first member, then by second and so on:

scala> val a = Array((1, 1), (3, 3), (2, 2))
a: Array[(Int, Int)] = Array((1,1), (3,3), (2,2))

scala> a.sorted
res1: Array[(Int, Int)] = Array((1,1), (2,2), (3,3))

scala> val a = Array((1, 2), (3, 1), (2, 3))
a: Array[(Int, Int)] = Array((1,2), (3,1), (2,3))

scala> a.sorted
res2: Array[(Int, Int)] = Array((1,2), (2,3), (3,1))

scala> val a = Array((1, 2), (3, 1), (1, 1))
a: Array[(Int, Int)] = Array((1,2), (3,1), (1,1))

scala> a.sorted
res3: Array[(Int, Int)] = Array((1,1), (1,2), (3,1))
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!