Apache Spark logging within Scala

后端未结

关注

 7  2340

I am looking for a solution to be able to log additional data when executing code on Apache Spark Nodes that could help investigate later some issues that might appear durin

相关标签:

7条回答

南笙

2020-12-12 19:45
If you need some code to be executed before and after a map, filter or other RDD function, try to use mapPartition, where the underlying iterator is passed explicitely.

Example:
```
val log = ??? // this gets captured and produced serialization error
rdd.map { x =>
  log.info(x)
  x+1
}
```
Becomes:
```
rdd.mapPartition { it =>
  val log = ??? // this is freshly initialized in worker nodes
  it.map { x =>
    log.info(x)
    x + 1
  }
}
```
Every basic RDD function is always implemented with a mapPartition.

Make sure to handle the partitioner explicitly and not to loose it: see Scaladoc, preservesPartitioning parameter, this is critical for performances.
0 讨论(0)
发布评论:

提交评论
- 加载中...

上一页 1 2