How can set the default spark logging level?

廉价感情. 提交于 2019-12-21 03:57:34

问题


I launch pyspark applications from pycharm on my own workstation, to a 8 node cluster. This cluster also has settings encoded in spark-defaults.conf and spark-env.sh

This is how I obtain my spark context variable.

spark = SparkSession \
        .builder \
        .master("spark://stcpgrnlp06p.options-it.com:7087") \
        .appName(__SPARK_APP_NAME__) \
        .config("spark.executor.memory", "50g") \
        .config("spark.eventlog.enabled", "true") \
        .config("spark.eventlog.dir", r"/net/share/grid/bin/spark/UAT/SparkLogs/") \
        .config("spark.cores.max", 128) \
        .config("spark.sql.crossJoin.enabled", "True") \
        .config("spark.executor.extraLibraryPath","/net/share/grid/bin/spark/UAT/bin/vertica-jdbc-8.0.0-0.jar") \
        .config("spark.serializer", "org.apache.spark.serializer.KryoSerializer") \
        .config("spark.logConf", "true") \
        .getOrCreate()

    sc = spark.sparkContext
    sc.setLogLevel("INFO")

I want to see the effective config that is being used in my log. This line

        .config("spark.logConf", "true") \

should cause the spark api to log its effective config to the log as INFO, but the default log level is set to WARN, and as such I don't see any messages.

setting this line

sc.setLogLevel("INFO")

shows INFO messages going forward, but its too late by then.

How can I set the default logging level that spark starts with?


回答1:


http://spark.apache.org/docs/latest/configuration.html#configuring-logging

Configuring Logging

Spark uses log4j for logging. You can configure it by adding a log4j.properties file in the conf directory. One way to start is to copy the existing log4j.properties.template located there.


The following blog about "How to log in spark" https://www.mapr.com/blog/how-log-apache-spark suggest a way to configure log4j, and provide suggestion which includes directing INFO level logs into a file.




回答2:


You need to edit your $SPARK_HOME/conf/log4j.properties file (create it if you don't have one). Now if you submit your code via spark-submit, then you want this line:

log4j.rootCategory=INFO, console

If you want INFO-level logs in your pyspark console, then you need this line:

log4j.logger.org.apache.spark.api.python.PythonGatewayServer=INFO



来源:https://stackoverflow.com/questions/40608412/how-can-set-the-default-spark-logging-level

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!