Spark: produce RDD[(X, X)] of all possible combinations from RDD[X]

后端 未结 4 1261
刺人心
刺人心 2020-11-30 03:59

Is it possible in Spark to implement \'.combinations\' function from scala collections?

   /** Iterates over combinations.
   *
   *  @return   An Iterator w         


        
4条回答
  •  忘掉有多难
    2020-11-30 04:56

    This creates all combinations (n, 2) and works for any RDD without requiring any ordering on the elements of RDD.

    val rddWithIndex = rdd.zipWithIndex
    rddWithIndex.cartesian(rddWithIndex).filter{case(a, b) => a._2 < b._2}.map{case(a, b) => (a._1, b._1)}
    

    a._2 and b._2 are the indices, while a._1 and b._1 are the elements of the original RDD.

    Example:

    Note that, no ordering is defined on the maps here.

    val m1 = Map('a' -> 1, 'b' -> 2)
    val m2 = Map('c' -> 3, 'a' -> 4)
    val m3 = Map('e' -> 5, 'c' -> 6, 'b' -> 7)
    val rdd = sc.makeRDD(Array(m1, m2, m3))
    val rddWithIndex = rdd.zipWithIndex
    rddWithIndex.cartesian(rddWithIndex).filter{case(a, b) => a._2 < b._2}.map{case(a, b) => (a._1, b._1)}.collect
    

    Output:

    Array((Map(a -> 1, b -> 2),Map(c -> 3, a -> 4)), (Map(a -> 1, b -> 2),Map(e -> 5, c -> 6, b -> 7)), (Map(c -> 3, a -> 4),Map(e -> 5, c -> 6, b -> 7)))
    

提交回复
热议问题