I\'m testing subprocesses pipelines with python. I\'m aware that I can do what the programs below do in python directly, but that\'s not the point. I just want to test the pipel
In one of the comments above, I challenged nosklo to either post some code to back up his assertions about select.select
or to upvote my responses he had previously down-voted. He responded with the following code:
from subprocess import Popen, PIPE
import select
p1 = Popen(["grep", "-v", "not"], stdin=PIPE, stdout=PIPE)
p2 = Popen(["cut", "-c", "1-10"], stdin=p1.stdout, stdout=PIPE, close_fds=True)
data_to_write = 100000 * 'hello world\n'
to_read = [p2.stdout]
to_write = [p1.stdin]
b = [] # create buffer
written = 0
while to_read or to_write:
read_now, write_now, xlist = select.select(to_read, to_write, [])
if read_now:
data = p2.stdout.read(1024)
if not data:
p2.stdout.close()
to_read = []
else:
b.append(data)
if write_now:
if written < len(data_to_write):
part = data_to_write[written:written+1024]
written += len(part)
p1.stdin.write(part); p1.stdin.flush()
else:
p1.stdin.close()
to_write = []
print b
One problem with this script is that it second-guesses the size/nature of the system pipe buffers. The script would experience fewer failures if it could remove magic numbers like 1024.
The big problem is that this script code only works consistently with the right combination of data input and external programs. grep and cut both work with lines, and so their internal buffers behave a bit differently. If we use a more generic command like "cat", and write smaller bits of data into the pipe, the fatal race condition will pop up more often:
from subprocess import Popen, PIPE
import select
import time
p1 = Popen(["cat"], stdin=PIPE, stdout=PIPE)
p2 = Popen(["cat"], stdin=p1.stdout, stdout=PIPE, close_fds=True)
data_to_write = 'hello world\n'
to_read = [p2.stdout]
to_write = [p1.stdin]
b = [] # create buffer
written = 0
while to_read or to_write:
time.sleep(1)
read_now, write_now, xlist = select.select(to_read, to_write, [])
if read_now:
print 'I am reading now!'
data = p2.stdout.read(1024)
if not data:
p1.stdout.close()
to_read = []
else:
b.append(data)
if write_now:
print 'I am writing now!'
if written < len(data_to_write):
part = data_to_write[written:written+1024]
written += len(part)
p1.stdin.write(part); p1.stdin.flush()
else:
print 'closing file'
p1.stdin.close()
to_write = []
print b
In this case, two different results will manifest:
write, write, close file, read -> success
write, read -> hang
So again, I challenge nosklo to either post code showing the use of
select.select
to handle arbitrary input and pipe buffering from a
single thread, or to upvote my responses.
Bottom line: don't try to manipulate both ends of a pipe from a single thread. It's just not worth it. See pipeline for a nice low-level example of how to do this correctly.