pipe large amount of data to stdin while using subprocess.Popen

后端 未结 10 615
花落未央
花落未央 2020-12-08 11:08

I\'m kind of struggling to understand what is the python way of solving this simple problem.

My problem is quite simple. If you use the follwing code it will hang. T

10条回答
  •  北海茫月
    2020-12-08 11:52

    Your code deadlocks as soon as cat's stdout OS pipe buffer is full. If you use stdout=PIPE; you have to consume it in time otherwise the deadlock as in your case may happen.

    If you don't need the output while the process is running; you could redirect it to a temporary file:

    #!/usr/bin/env python3
    import subprocess
    import tempfile
    
    with tempfile.TemporaryFile('r+') as output_file:
        with subprocess.Popen(['cat'],
                              stdin=subprocess.PIPE,
                              stdout=output_file,
                              universal_newlines=True) as process:
            for i in range(100000):
                print(i, file=process.stdin)
        output_file.seek(0)  # rewind (and sync with the disk)
        print(output_file.readline(), end='')  # get  the first line of the output
    

    If the input/output are small (fit in memory); you could pass the input all at once and get the output all at once using .communicate() that reads/writes concurrently for you:

    #!/usr/bin/env python3
    import subprocess
    
    cp = subprocess.run(['cat'], input='\n'.join(['%d' % i for i in range(100000)]),
                        stdout=subprocess.PIPE, universal_newlines=True)
    print(cp.stdout.splitlines()[-1]) # print the last line
    

    To read/write concurrently manually, you could use threads, asyncio, fcntl, etc. @Jed provided a simple thread-based solution. Here's asyncio-based solution:

    #!/usr/bin/env python3
    import asyncio
    import sys
    from subprocess import PIPE
    
    async def pump_input(writer):
         try:
             for i in range(100000):
                 writer.write(b'%d\n' % i)
                 await writer.drain()
         finally:
             writer.close()
    
    async def run():
        # start child process
        # NOTE: universal_newlines parameter is not supported
        process = await asyncio.create_subprocess_exec('cat', stdin=PIPE, stdout=PIPE)
        asyncio.ensure_future(pump_input(process.stdin)) # write input
        async for line in process.stdout: # consume output
            print(int(line)**2) # print squares
        return await process.wait()  # wait for the child process to exit
    
    
    if sys.platform.startswith('win'):
        loop = asyncio.ProactorEventLoop() # for subprocess' pipes on Windows
        asyncio.set_event_loop(loop)
    else:
        loop = asyncio.get_event_loop()
    loop.run_until_complete(run())
    loop.close()
    

    On Unix, you could use fcntl-based solution:

    #!/usr/bin/env python3
    import sys
    from fcntl import fcntl, F_GETFL, F_SETFL
    from os import O_NONBLOCK
    from shutil import copyfileobj
    from subprocess import Popen, PIPE, _PIPE_BUF as PIPE_BUF
    
    def make_blocking(pipe, blocking=True):
        fd = pipe.fileno()
        if not blocking:
            fcntl(fd, F_SETFL, fcntl(fd, F_GETFL) | O_NONBLOCK) # set O_NONBLOCK
        else:
            fcntl(fd, F_SETFL, fcntl(fd, F_GETFL) & ~O_NONBLOCK) # clear it
    
    
    with Popen(['cat'], stdin=PIPE, stdout=PIPE) as process:
        make_blocking(process.stdout, blocking=False)
        with process.stdin:
            for i in range(100000):
                #NOTE: the mode is block-buffered (default) and therefore
                # `cat` won't see it immidiately
                process.stdin.write(b'%d\n' % i)
                # a deadblock may happen here with a *blocking* pipe
                output = process.stdout.read(PIPE_BUF)
                if output is not None:
                    sys.stdout.buffer.write(output)
        # read the rest
        make_blocking(process.stdout)
        copyfileobj(process.stdout, sys.stdout.buffer)
    

提交回复
热议问题