ProcessPoolExecutor logging fails to log inside function on Windows but not on Unix / Mac

霸气de小男生 提交于 2019-12-10 13:21:02

问题


When I run the following script on a Windows computer, I do not see any of the log messages from the log_pid function, however I do when I run on Unix / Mac. I've read before that multiprocessing is different on Windows compared to Mac, but it's not clear to me what changes should I make to get this script to work on Windows. I'm running Python 3.6.

import logging
import sys
from concurrent.futures import ProcessPoolExecutor
import os


def log_pid(x):
    logger.info('Executing on process: %s' % os.getpid())


def do_stuff():
    logger.info('this is the do stuff function.')
    with ProcessPoolExecutor(max_workers=4) as executor:
        executor.map(log_pid, range(0, 10))


def main():
    logger.info('this is the main function.')
    do_stuff()


if __name__ == '__main__':
    logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)
    logger = logging.getLogger(__name__)

    logger.info('Start of script ...')

    main()

    logger.info('End of script ...')

回答1:


Unix processes are created via the fork strategy where the child gets cloned from the parent and continues its execution right at the moment the parent forked.

On Windows is quite different: a blank process is created and a new Python interpreter gets launched. The interpreter will then load the module where the log_pid function is located and execute it.

This means the __main__ section is not executed by the newly spawned child process. Hence, the logger object is not created and the log_pid function crashes accordingly. You don't see the error because you ignore the result of your computation. Try to modify the logic as follows.

def do_stuff():
    logger.info('this is the do stuff function.')
    with ProcessPoolExecutor(max_workers=4) as executor:
        iterator = executor.map(log_pid, range(0, 10))
        list(iterator)  # collect the results in a list

And the issue will become evident.

Traceback (most recent call last):
  File "C:\Program Files (x86)\Python36-32\lib\concurrent\futures\process.py", line 175, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
  File "C:\Program Files (x86)\Python36-32\lib\concurrent\futures\process.py", line 153, in _process_chunk
    return [fn(*args) for args in chunk]
  File "C:\Program Files (x86)\Python36-32\lib\concurrent\futures\process.py", line 153, in <listcomp>
    return [fn(*args) for args in chunk]
  File "C:\Users\cafama\Desktop\pool.py", line 8, in log_pid
    logger.info('Executing on process: %s' % os.getpid())
NameError: name 'logger' is not defined

When dealing with process pools (whether concurrent.futures or multiprocessing ones) always collect the result of the computation to avoid silent bugs to cause confusion.

To fix the problem, just move the logger creation at the top level of the module and everything will work on all platforms.

import logging
import sys
from concurrent.futures import ProcessPoolExecutor
import os

logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)
logger = logging.getLogger(__name__)   

def log_pid(x):
    logger.info('Executing on process: %s' % os.getpid())

...


来源:https://stackoverflow.com/questions/49782749/processpoolexecutor-logging-fails-to-log-inside-function-on-windows-but-not-on-u

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!