Apache Spark logging within Scala

后端 未结 7 2330
野趣味
野趣味 2020-12-12 19:18

I am looking for a solution to be able to log additional data when executing code on Apache Spark Nodes that could help investigate later some issues that might appear durin

相关标签:
7条回答
  • 2020-12-12 19:45

    If you need some code to be executed before and after a map, filter or other RDD function, try to use mapPartition, where the underlying iterator is passed explicitely.

    Example:

    val log = ??? // this gets captured and produced serialization error
    rdd.map { x =>
      log.info(x)
      x+1
    }
    

    Becomes:

    rdd.mapPartition { it =>
      val log = ??? // this is freshly initialized in worker nodes
      it.map { x =>
        log.info(x)
        x + 1
      }
    }
    

    Every basic RDD function is always implemented with a mapPartition.

    Make sure to handle the partitioner explicitly and not to loose it: see Scaladoc, preservesPartitioning parameter, this is critical for performances.

    0 讨论(0)
提交回复
热议问题