How do I log from my Python Spark script

前端 未结 6 1383
南旧
南旧 2020-11-30 01:51

I have a Python Spark program which I run with spark-submit. I want to put logging statements in it.

logging.info(\"This is an informative messa         


        
6条回答
  •  醉话见心
    2020-11-30 02:16

    You need to get the logger for spark itself, by default getLogger() will return the logger for you own module. Try something like:

    logger = logging.getLogger('py4j')
    logger.info("My test info statement")
    

    It might also be 'pyspark' instead of 'py4j'.

    In case the function that you use in your spark program (and which does some logging) is defined in the same module as the main function it will give some serialization error.

    This is explained here and an example by the same person is given here

    I also tested this on spark 1.3.1

    EDIT:

    To change logging from STDERR to STDOUT you will have to remove the current StreamHandler and add a new one.

    Find the existing Stream Handler (This line can be removed when finished)

    print(logger.handlers)
    # will look like []
    

    There will probably only be a single one, but if not you will have to update position.

    logger.removeHandler(logger.handlers[0])
    

    Add new handler for sys.stdout

    import sys # Put at top if not already there
    sh = logging.StreamHandler(sys.stdout)
    sh.setLevel(logging.DEBUG)
    logger.addHandler(sh)
    

提交回复
热议问题