I am looking for a solution to be able to log additional data when executing code on Apache Spark Nodes that could help investigate later some issues that might appear durin
This is an old post but I want to provide my working solution which I just got after struggling a lot and still can be useful for others:
I want to print rdd contents inside rdd.map function but getting Task Not Serializalable Error
. This is my solution for this problem using scala static object which is extending java.io.Serializable
:
import org.apache.log4j.Level
object MyClass extends Serializable{
val log = org.apache.log4j.LogManager.getLogger("name of my spark log")
log.setLevel(Level.INFO)
def main(args:Array[String])
{
rdd.map(t=>
//Using object's logger here
val log =MyClass.log
log.INFO("count"+rdd.count)
)
}
}