How do I log from my Python Spark script

前端 未结 6 1387
南旧
南旧 2020-11-30 01:51

I have a Python Spark program which I run with spark-submit. I want to put logging statements in it.

logging.info(\"This is an informative messa         


        
6条回答
  •  南笙
    南笙 (楼主)
    2020-11-30 02:07

    In my case, I am just happy to get my log messages added to the workers stderr, along with the usual spark log messages.

    If that suits your needs, then the trick is to redirect the particular Python logger to stderr.

    For example, the following, inspired from this answer, works fine for me:

    def getlogger(name, level=logging.INFO):
        import logging
        import sys
    
        logger = logging.getLogger(name)
        logger.setLevel(level)
        if logger.handlers:
            # or else, as I found out, we keep adding handlers and duplicate messages
            pass
        else:
            ch = logging.StreamHandler(sys.stderr)
            ch.setLevel(level)
            formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
            ch.setFormatter(formatter)
            logger.addHandler(ch)
        return logger
    

    Usage:

    def tst_log():
        logger = getlogger('my-worker')
        logger.debug('a')
        logger.info('b')
        logger.warning('c')
        logger.error('d')
        logger.critical('e')
        ...
    

    Output (plus a few surrounding lines for context):

    17/05/03 03:25:32 INFO MemoryStore: Block broadcast_24 stored as values in memory (estimated size 5.8 KB, free 319.2 MB)
    2017-05-03 03:25:32,849 - my-worker - INFO - b
    2017-05-03 03:25:32,849 - my-worker - WARNING - c
    2017-05-03 03:25:32,849 - my-worker - ERROR - d
    2017-05-03 03:25:32,849 - my-worker - CRITICAL - e
    17/05/03 03:25:32 INFO PythonRunner: Times: total = 2, boot = -40969, init = 40971, finish = 0
    17/05/03 03:25:32 INFO Executor: Finished task 7.0 in stage 20.0 (TID 213). 2109 bytes result sent to driver
    

提交回复
热议问题