difference between rdd.collect().toMap to rdd.collectAsMap()?

前端 未结 2 380
死守一世寂寞
死守一世寂寞 2021-01-02 10:15

Is there any performance impact when I use collectAsMap on my RDD instead of rdd.collect().toMap ?

I have a key value rdd and I want to convert to HashMap as far I

相关标签:
2条回答
  • 2021-01-02 10:38

    The implementation of collectAsMap is the following

    def collectAsMap(): Map[K, V] = self.withScope {
        val data = self.collect()
        val map = new mutable.HashMap[K, V]
        map.sizeHint(data.length)
        data.foreach { pair => map.put(pair._1, pair._2) }
        map
      }
    

    Thus, there is no performance difference between collect and collectAsMap, because collectAsMap calls under the hood also collect.

    0 讨论(0)
  • 2021-01-02 11:01

    No difference. Avoid using collect() as much as you can as it destroys the concept of parallelism and collects the data on the driver.

    0 讨论(0)
提交回复
热议问题