Python's Popen + communicate only returning the first line of stdout

问题

I'm trying to use my command-line git client and Python's I/O redirection in order to automate some common operations on a lot of git repos. (Yes, this is hack-ish. I might go back and use a Python library to do this later, but for now it seems to be working out ok :) )

I'd like to be able to capture the output of calling git. Hiding the output will look nicer, and capturing it will let me log it in case it's useful.

My problem is that I can't get more than the first line of output when I run a 'git clone' command. Weirdly, the same code with 'git status' seems to work just fine.

I'm running Python 2.7 on Windows 7 and I'm using the cmd.exe command interpreter.

My sleuthing so far:

When I call subprocess.call() with "git clone" it runs fine and I see the output on the console (which confirms that git is producing output, even though I'm not capturing it). This code:

dir = "E:\\Work\\etc\\etc"
os.chdir(dir)
git_cmd = "git clone git@192.168.56.101:Mike_VonP/bit142_assign_2.git"

#print "SUBPROCESS.CALL" + "="*20
#ret = subprocess.call(git_cmd.split(), shell=True)

will produce this output on the console:

SUBPROCESS.CALL====================
Cloning into 'bit142_assign_2'...
remote: Counting objects: 9, done.
remote: Compressing objects: 100% (4/4), done.
remote: Total 9 (delta 0), reused 0 (delta 0)
Receiving objects: 100% (9/9), done.
Checking connectivity... done.

If I do the same thing with POpen directly, I see the same output on the console (which is also not being captured). This code:

# (the dir = , os.chdir, and git_cmd= lines are still executed here)
print "SUBPROCESS.POPEN" + "="*20
p=subprocess.Popen(git_cmd.split(), shell=True)
p.wait()

will produce this (effectively identical) output:

SUBPROCESS.POPEN====================
Cloning into 'bit142_assign_2'...
remote: Counting objects: 9, done.
remote: Compressing objects: 100% (4/4), done.
remote: Total 9 (delta 0), reused 0 (delta 0)
Receiving objects: 100% (9/9), done.
Checking connectivity... done.

(Obviously I'm deleting the cloned repo between runs, otherwise I'd get a 'Everything is up to date' message)

If I use the communicate() method what I expect is to get a string that contains all the output that I'm seeing above. Instead I only see the line Cloning into 'bit142_assign_2'....
This code:

print "SUBPROCESS.POPEN, COMMUNICATE" + "="*20
p=subprocess.Popen(git_cmd.split(), shell=True,\
            bufsize = 1,\
            stderr=subprocess.PIPE,\
            stdout=subprocess.PIPE)
tuple = p.communicate()
p.wait()
print "StdOut:\n" + tuple[0]
print "StdErr:\n" + tuple[1]

will produce this output:

SUBPROCESS.POPEN, COMMUNICATE====================
StdOut:

StdErr:
Cloning into 'bit142_assign_2'...

On the one hand I've redirected the output (as you can see from the fact that it's not in the output) but I'm also only capturing that first line.

I've tried lots and lots of stuff (calling check_output instead of popen, using pipes with subprocess.call, using pipes with subprocess.popen, and probably other stuff I've forgotten about) but nothing works - I only ever capture that first line of output.

Interestingly, the exact same code does work correctly with 'git status'. Once the repo has been cloned calling git status produces three lines of output (which collectively say 'everything is up to date') and that third example (the POpen+communicate code) does capture all three lines of output.

If anyone has any ideas about what I'm doing wrong or any thoughts on anything I could try in order to better diagnose this problem I would greatly appreciate it.

回答1:

Try adding the --progress option to your git command. This forces git to emit the progress status to stderr even when the the git process is not attached to a terminal - which is the case when running git via the subprocess functions.

git_cmd = "git clone --progress git@192.168.56.101:Mike_VonP/bit142_assign_2.git"

print "SUBPROCESS.POPEN, COMMUNICATE" + "="*20
p = subprocess.Popen(git_cmd.split(), stderr=subprocess.PIPE, stdout=subprocess.PIPE)
tuple = p.communicate()
p.wait()
print "StdOut:\n" + tuple[0]
print "StdErr:\n" + tuple[1]

N.B. I am unable to test this on Windows, but it is effective on Linux.

Also, it should not be necessary to specify shell=True and this might be a security problem, so it's best avoided.

回答2:

There are two parts of interest here, one being Python-specific and one being Git-specific.

Python

When using the subprocess module, you can elect to control up to three I/O channels of the program you run: stdin, stdout, and stderr. This is true for subprocess.call and subprocess.check_call as well as subprocess.Popen, but both call and check_call immediately call the new process object's wait method, so for various reasons, it's unwise to supply subprocess.PIPE for the stdout and/or stderr with these two operations.¹

Other than that, using subprocess.call is equivalent to using subprocess.Popen. In fact, the code for call is a one-liner:

def call(*popenargs, **kwargs):
    return Popen(*popenargs, **kwargs).wait()

If you choose not to redirect any of the I/O channels, programs that read input get it from the same place Python would, programs that write output to stdout write it to the same place your own Python code would,² and programs that write output to stderr write it to the same place Python would.

You can, of course, redirect stdout and/or stderr to actual files, as well as to subprocess.PIPEs. Files and pipes are not interactive "terminal" or "tty" devices (i.e., are not seen as being directly connected to a human being). This leads us to Git.

Git

Git programs may generally read from stdin and/or write to stdout and/or stderr. Git may also invoke additional programs, which may do the same, or may bypass these standard I/O channels.

In particular, git clone mainly writes to its stderr, as you have observed. Moreover, as mhawke answered, you must add --progress to make Git write progress messages to stderr Git is not talking to an interactive tty device.

If Git needs a password or other authentication when cloning via https or ssh, Git will run an auxiliary program to get this. These programs, for the most part, bypass stdin entirely (by opening /dev/tty on POSIX systems, or the equivalent on Windows), so as to interact with the user. How well this will work, or whether it will work at all, in your automated environment is a good question (but again outside the scope of this answer). But this does bring us back to Python, because ...

Python

Besides the subprocess module, there are some external libraries, sh and pexpect, and some facilities built into Python itself via the pty module, that can open a pseudo-tty: an interactive tty device that, instead of being connected directly to a human, is connected to your program.

When using ptys, you can have Git behave identically to when it is talking directly to a human—in fact, "talking to a human" today is actually done with ptys (or equivalent) anyway, since there are programs running the various windowing systems. Moreover, programs that ask a human for a password may³ now interact with your own Python code. This can be good or bad (or even both), so consider whether you want that to happen.

¹Specifically, the point of the communicate method is to manage I/O traffic between the up-to-three streams, if any or all of them are PIPE, without having the subprocess wedge. Imagine, if you will, a subprocess that prints 64K of text to stdout, then 64K of text to stderr, then another 64K of text to stdout, and then reads from stdin. If you try to read or write any of these in any specific order, the subprocess will "get stuck" waiting for you to clear something else, while you'll get stuck waiting for the subprocess to finish whichever one you chose to complete first. What communicate does instead is to use threads or OS-specific non-blocking I/O methods to feed the subprocess input while reading its stdout and stderr, all simultaneously.

In other words, it handled multiplexing. Thus, if you are not supplying subprocess.PIPE for at least two of the three I/O channels, it's safe to bypass the communicate method. If you are, it is not (unless you implement your own multiplexing).

There's a somewhat curious edge case here: if you supply subprocess.STDOUT for the stderr output, this tells Python to direct the two outputs of the subprocess into a single communications channel. This counts as only one pipe, so if you combine the subprocess's stdout and stderr, and supply no input, you can bypass the communicate method.

²In fact, the subprocess inherits the process's stdin, stdout, and stderr, which may not match Python's sys.stdin, sys.stdout, and sys.stderr if you've over-ridden those. This gets into details probably best ignored here. :-)

³I say "may" instead of "will" because /dev/tty accesses the controlling terminal, and not all ptys are controlling terminals. This also gets complicated and OS-specific and is also beyond the scope of this answer.

来源：https://stackoverflow.com/questions/39564455/pythons-popen-communicate-only-returning-the-first-line-of-stdout

标签

python

git

popen

communicate