Apache Spark logging within Scala

后端 未结 7 2329
野趣味
野趣味 2020-12-12 19:18

I am looking for a solution to be able to log additional data when executing code on Apache Spark Nodes that could help investigate later some issues that might appear durin

相关标签:
7条回答
  • 2020-12-12 19:20

    Here is my solution :

    I am using SLF4j (with Log4j binding), in my base class of every spark job I have something like this:

    import org.slf4j.LoggerFactory
    val LOG = LoggerFactory.getLogger(getClass) 
    

    Just before the place where I use LOG in distributed functional code, I copy logger reference to a local constant.

    val LOG = this.LOG
    

    It worked for me!

    0 讨论(0)
  • 2020-12-12 19:22

    This is an old post but I want to provide my working solution which I just got after struggling a lot and still can be useful for others:

    I want to print rdd contents inside rdd.map function but getting Task Not Serializalable Error. This is my solution for this problem using scala static object which is extending java.io.Serializable:

    import org.apache.log4j.Level
    
    object MyClass extends Serializable{
    
    val log = org.apache.log4j.LogManager.getLogger("name of my spark log")
    
    log.setLevel(Level.INFO)
    
    def main(args:Array[String])
    {
    
    rdd.map(t=>
    
    //Using object's logger here
    
    val log =MyClass.log
    
    log.INFO("count"+rdd.count)
    )
    }
    
    }
    
    0 讨论(0)
  • 2020-12-12 19:24

    Use Log4j 2.x. The core logger has been made serializable. Problem solved.

    Jira discussion: https://issues.apache.org/jira/browse/LOG4J2-801

    "org.apache.logging.log4j" % "log4j-api" % "2.x.x"
    
    "org.apache.logging.log4j" % "log4j-core" % "2.x.x"
    
    "org.apache.logging.log4j" %% "log4j-api-scala" % "2.x.x"
    
    0 讨论(0)
  • 2020-12-12 19:25

    Making the logger transient and lazy does the trick

    @transient lazy val log = Logger.getLogger(getClass.getName)

    @transient will tell the spark to not serialize it for all executors and lazy will cause the instance to be created when it is first used. In other words each executor will have their own instance of the logger. Serializing the logger is not a good idea anyway even if you can.

    Ofcourse anything you put in the map() closure will run on the executor so will be found in executor logs and not the driver logs. For custom log4j properties on the executors you need to add the log4j.properties to executor classpath and send your log4j.properties to the executors.

    This can be done by adding the following args to your spark-submit command --conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=./log4j.properties " --files ./log4j.properties There are other ways to do set these configs but this one is the most common.

    0 讨论(0)
  • 2020-12-12 19:28
    val log = Logger.getLogger(getClass.getName),
    

    You can use "log" to write logs . Also if you need change logger properties you need to have log4j.properties in /conf folder. By default we will have a template in that location.

    0 讨论(0)
  • 2020-12-12 19:32

    You can use Akhil's solution proposed in
    https://www.mail-archive.com/user@spark.apache.org/msg29010.html. I have used by myself and it works.

    Akhil Das Mon, 25 May 2015 08:20:40 -0700
    Try this way:

    object Holder extends Serializable {      
       @transient lazy val log = Logger.getLogger(getClass.getName)    
    }
    
    
    val someRdd = spark.parallelize(List(1, 2, 3)).foreach { element =>
       Holder.log.info(element)
    }
    
    0 讨论(0)
提交回复
热议问题