SPARK Cost of Initializing Database Connection in map / mapPartitions context

前端 未结 2 1646
伪装坚强ぢ
伪装坚强ぢ 2021-01-16 11:26

Examples borrowed from Internet, thanks to those with better insights.

The following can be found on various forums in relation to mapPartitions and map:



        
2条回答
  •  [愿得一人]
    2021-01-16 11:51

    In my opinion, connection should be kept out and created just once before map and closed post task completion.

    val connection = new DbConnection /creates a db connection per partition/

    val newRd = myRdd.mapPartitions(
      partition => {    
    
        val newPartition = partition.map(
           record => {
             readMatchingFromDB(record, connection)
         })
    
        newPartition
      })
    
    connection.close()
    

提交回复
热议问题