blocks - send input to python subprocess pipeline

前端 未结 11 1447
轻奢々
轻奢々 2021-01-30 09:22

I\'m testing subprocesses pipelines with python. I\'m aware that I can do what the programs below do in python directly, but that\'s not the point. I just want to test the pipel

11条回答
  •  萌比男神i
    2021-01-30 09:31

    In one of the comments above, I challenged nosklo to either post some code to back up his assertions about select.select or to upvote my responses he had previously down-voted. He responded with the following code:

    from subprocess import Popen, PIPE
    import select
    
    p1 = Popen(["grep", "-v", "not"], stdin=PIPE, stdout=PIPE)
    p2 = Popen(["cut", "-c", "1-10"], stdin=p1.stdout, stdout=PIPE, close_fds=True)
    
    data_to_write = 100000 * 'hello world\n'
    to_read = [p2.stdout]
    to_write = [p1.stdin]
    b = [] # create buffer
    written = 0
    
    
    while to_read or to_write:
        read_now, write_now, xlist = select.select(to_read, to_write, [])
        if read_now:
            data = p2.stdout.read(1024)
            if not data:
                p2.stdout.close()
                to_read = []
            else:
                b.append(data)
    
        if write_now:
            if written < len(data_to_write):
                part = data_to_write[written:written+1024]
                written += len(part)
                p1.stdin.write(part); p1.stdin.flush()
            else:
                p1.stdin.close()
                to_write = []
    
    print b
    

    One problem with this script is that it second-guesses the size/nature of the system pipe buffers. The script would experience fewer failures if it could remove magic numbers like 1024.

    The big problem is that this script code only works consistently with the right combination of data input and external programs. grep and cut both work with lines, and so their internal buffers behave a bit differently. If we use a more generic command like "cat", and write smaller bits of data into the pipe, the fatal race condition will pop up more often:

    from subprocess import Popen, PIPE
    import select
    import time
    
    p1 = Popen(["cat"], stdin=PIPE, stdout=PIPE)
    p2 = Popen(["cat"], stdin=p1.stdout, stdout=PIPE, close_fds=True)
    
    data_to_write = 'hello world\n'
    to_read = [p2.stdout]
    to_write = [p1.stdin]
    b = [] # create buffer
    written = 0
    
    
    while to_read or to_write:
        time.sleep(1)
        read_now, write_now, xlist = select.select(to_read, to_write, [])
        if read_now:
            print 'I am reading now!'
            data = p2.stdout.read(1024)
            if not data:
                p1.stdout.close()
                to_read = []
            else:
                b.append(data)
    
        if write_now:
            print 'I am writing now!'
            if written < len(data_to_write):
                part = data_to_write[written:written+1024]
                written += len(part)
                p1.stdin.write(part); p1.stdin.flush()
            else:
                print 'closing file'
                p1.stdin.close()
                to_write = []
    
    print b
    

    In this case, two different results will manifest:

    write, write, close file, read -> success
    write, read -> hang
    

    So again, I challenge nosklo to either post code showing the use of select.select to handle arbitrary input and pipe buffering from a single thread, or to upvote my responses.

    Bottom line: don't try to manipulate both ends of a pipe from a single thread. It's just not worth it. See pipeline for a nice low-level example of how to do this correctly.

提交回复
热议问题