How to flatten a collection with Spark/Scala?

前端 未结 2 907
时光取名叫无心
时光取名叫无心 2020-12-09 09:38

In Scala I can flatten a collection using :

val array = Array(List(\"1,2,3\").iterator,List(\"1,4,5\").iterator)
                                                    


        
相关标签:
2条回答
  • 2020-12-09 09:58

    Try flatMap with an identity map function (y => y):

    scala> val x = sc.parallelize(List(List("a"), List("b"), List("c", "d")))
    x: org.apache.spark.rdd.RDD[List[String]] = ParallelCollectionRDD[1] at parallelize at <console>:12
    
    scala> x.collect()
    res0: Array[List[String]] = Array(List(a), List(b), List(c, d))
    
    scala> x.flatMap(y => y)
    res3: org.apache.spark.rdd.RDD[String] = FlatMappedRDD[3] at flatMap at <console>:15
    
    scala> x.flatMap(y => y).collect()
    res4: Array[String] = Array(a, b, c, d)
    
    0 讨论(0)
  • 2020-12-09 10:05

    Use flatMap and the identity Predef, this is more readable than using x => x, e.g.

    myRdd.flatMap(identity)
    
    0 讨论(0)
提交回复
热议问题