Python wait until data is in sys.stdin

问题

my problem is the following:

My pythons script receives data via sys.stdin, but it needs to wait until new data is available on sys.stdin.

As described in the manpage from python, i use the following code but it totally overloads my cpu.

#!/usr/bin/python -u
import sys
while 1:
     for line in sys.stdin.readlines():
         do something useful

Is there any good way to solve the high cpu usage?

Edit:

All your solutions don't work. I give you exactly my problem.

You can configure the apache2 daemon that he sends every logline to a program and not to write in a logfile.

This looks something like that:

CustomLog "|/usr/bin/python -u /usr/local/bin/client.py" combined

Apache2 expects from my script that it runs always, waits for data on sys.stdin and parses it then there is data.

If i only use a for loop the script will exit, because at a point there is no data in sys.stdin and apache2 will say ohh your script exited unexpectedly.

If i use a while true loop my script will use 100% cpu usage.

回答1:

The following should just work.

import sys
for line in sys.stdin:
    # whatever

Rationale:

The code will iterate over lines in stdin as they come in. If the stream is still open, but there isn't a complete line then the loop will hang until either a newline character is encountered (and the whole line returned) or the stream is closed (and the whatever is left in the buffer is returned).

Once the stream has been closed, no more data can be written to or read from stdin. Period.

The reason that your code was overloading your cpu is that once the stdin has been closed any subsequent attempts to iterate over stdin will return immediately without doing anything. In essence your code was equivalent to the following.

for line in sys.stdin:
    # do something

while 1:
    pass # infinite loop, very CPU intensive

Maybe it would be useful if you posted how you were writing data to stdin.

EDIT:

Python will (for the purposes of for loops, iterators and readlines() consider a stream closed when it encounters an EOF character. You can ask python to read more data after this, but you cannot use any of the previous methods. The python man page recommends using

import sys
while True:
    line = sys.stdin.readline()
    # do something with line

When an EOF character is encountered readline will return an empty string. The next call to readline will function as normal if the stream is still open. You can test this out yourself by running the command in a terminal. Pressing ctrl+D will cause a terminal to write the EOF character to stdin. This will cause the first program in this post to terminate, but the last program will continue to read data until the stream is actually closed. The last program should not 100% your CPU as readline will wait until there is data to return rather than returning an empty string.

I only have the problem of a busy loop when I try readline from an actual file. But when reading from stdin, readline happily blocks.

回答2:

Use this:

#!/usr/bin/python
import sys
for line in sys.stdin.readlines():
    pass # do something useful

回答3:

Well i will stick now to these lines of code.

#!/usr/bin/python
import sys
import time
while 1:
    time.sleep(0.01)
    for line in sys.stdin:
        pass # do something useful

If i don't use time.sleep, the script will create a too high load on cpu usage.

If i use:

for line in sys.stdin.readline():

It will only parse one line in 0.01 seconds and the performance of the apache2 is really bad Thank you very much for your answers.

best regards Abalus

回答4:

I've come back to problem after a long time. The issue appears to be that Apache treats a CustomLog like a file -- something it can open, write to, close, and then reopen at a later date. This causes the receiving process to be told that it's input stream has been closed. However, that doesn't mean the processes input stream cannot be written to again, just that whichever process was writing to the input stream will not be writing to it again.

The best way to deal with this is to setup a handler and let the OS know to invoke the handler whenever input is written to standard input. Normally you should avoid heavily relying on OS signal event handling as they are relatively expensive. However, copying a megabyte of text to following only produced two SIGIO events, so it's okay in this case.

fancyecho.py

import sys
import os
import signal
import fcntl
import threading

io_event = threading.Event()

# Event handlers should generally be as compact as possible.
# Here all we do is notify the main thread that input has been received.
def handle_io(signal, frame):
    io_event.set()

# invoke handle_io on a SIGIO event
signal.signal(signal.SIGIO, handle_io)
# send io events on stdin (fd 0) to our process 
assert fcntl.fcntl(0, fcntl.F_SETOWN, os.getpid()) == 0
# tell the os to produce SIGIO events when data is written to stdin
assert fcntl.fcntl(0, fcntl.F_SETFL, os.O_ASYNC) == 0

print("pid is:", os.getpid())
while True:
    data = sys.stdin.read()
    io_event.clear()
    print("got:", repr(data))
    io_event.wait()

How you might use this toy program. Output has been cleaned up due to interleaving of input and output.

$ echo test | python3 fancyecho.py &
[1] 25487
pid is: 25487
got: 'test\n'
$ echo data > /proc/25487/fd/0
got: 'data\n'
$

回答5:

This actually works flawlessly (i.e. no runnaway CPU) - when you call the script from the shell, like so:

tail -f input-file | yourscript.py

Obviously, that is not ideal - since you then have to write all relevant stdout to that file -

but it works without a lot of overhead! Namely because of using readline() - I think:

while 1:
        line = sys.stdin.readline()

It will actually stop and wait at that line until it gets more input.

Hope this helps someone!

回答6:

I know I am bringing old stuff to life, but this seems to be one of the top hits on the topic. The solution Abalus has settled for has fixed time.sleep each cycle, regardles if the stdin is actually empty and the program should be idling or there are a lot of lines waiting to be processed. A small modification makes the program process all messages rapidly and wait only if the queue is actually empty. So only one line that arrives during the sleep period can wait, the others are processed without any lag.

This example is simply reversing the input lines, if you submit only one line it responds in a second (or whatever your sleep period is set), but can also process something like "ls -l | reverse.py" really quickly. The CPU load for such approach is minimal even on embedded systems like OpenWRT.

import sys
import time

while True:
  line=sys.stdin.readline().rstrip()
  if line:       
    sys.stdout.write(line[::-1]+'\n')
  else:
    sys.stdout.flush()
    time.sleep(1)

回答7:

I have been having a similar problem where python waits for the sender (whether a user or another program) to close the stream before the loop starts executing. I had solved it, but it was clearly non pythonic as I had to resort to while True: and sys.stdin.readline()

I eventually found a reference in a comment in another post to a module called io, which is an alternative to the standard file object. In Python 3 this is the default. From what I can make out Python 2 treats stdin like a normal file and not a stream.

Try this, it worked for me:

sys.stdin = io.open(sys.stdin.fileno())  # default is line buffering, good for user input

for line in sys.stdin:
    # Do stuff with line

回答8:

I know this is an old thread but I stumbled upon the same problem and figured out that this was more to do with how the script was invoked rather than a problem with the script. At least in my case this turned out to be the problem with the 'system shell' on debian (ie: what /bin/sh is linked to -- this is what apache uses to execute the command that CustomLog pipes to). More info here: http://www.spinics.net/lists/dash/msg00675.html

hth, - steve

回答9:

This works for me, code of /tmp/alog.py:

#! /usr/bin/python

import sys

fout = open("/tmp/alog.log", "a")

while True:
    dat = sys.stdin.readline()
    fout.write(dat)
    fout.flush()

in http.conf:

CustomLog "|/tmp/alog.py" combined

The key is don't use

for dat in sys.stdin:

You will wait there get nothing. And for testing, remember fout.flush(), otherwise you may not see output. I test on fedora 15, python 2.7.1, Apache 2.2, not cpu load, alog.py will exists in memory, if you ps you can see it.

来源：https://stackoverflow.com/questions/7056306/python-wait-until-data-is-in-sys-stdin

标签

python

wait