问题
I'm trying to build a python script that will allow me dynamic build up on egrep -v attributes and pipe the output into less (or more).
The reason why I want to use external egrep+less is because files that I am processing are very large text files (500MB+). Reading them first into a list and processing all natively through Python is very slow.
However, when I use os.system or subprocess.call, everything is very slow at the moment I want to exit less output and return back to python code.
My code should work like this:
1. ./myless.py messages_500MB.txt
2. Less -FRX output of messages_500MB.txt is shown (complete file).
3. When I press 'q' to exit less -FRX, python code should take over and display prompt for user to enter text to be excluded. User enters it and I add this to the list
4. My python code builds up egrep -v 'exclude1' and pipes the output to less
5. User repeats step 3 and enters another stuff to be excluded
6. Now my python code calls egrep -v 'exclude1|exclude2' messages_500MB.txt | less -FRX
7. And the process continues
However, this does not work as expected.
* On my Mac, when user press q to exit less -FRX, it takes few seconds for raw_input prompt to be displayed
* On Linux machine, I get loads of 'egrep: writing output: Broken pipe'
* If, (linux only) while in less -FRX, I press CTRL+C, exiting less -FRX for some reason becomes much much quicker (as intended). On Mac, my python program breaks
Here is sample of my code:
excluded = list()
myInput = ''
while myInput != 'q':
grepText = '|'.join(excluded)
if grepText == '':
command = 'egrep "" ' + file + ' | less -FRX'
else:
command = 'egrep -v "' + grepText + '" ' + file + ' | less -FRX'
subprocess.call(command, shell=True)
myInput = raw_input('Enter text to exclude, q to exit, # to see what is excluded: ')
excluded.append(myInput)
Any help would be much appreciated
回答1:
Actually I figured out what the problem is
I did some research on error that is visible when running my script on Linux ("egrep: writing output: Broken pipe") and that lead me to the answer:
Issue is when I use egrep -v 'xyz' file | less, when I quit less, subprocess still continues to run egrep and on large files (500MB+) this takes a while.
Aparently, subprocess takes two programs separately and runs the first one (egrep) even after the second one (less) exited
To properly resolve my issue, I use something like this:
command = 'egrep -v "something" <filename>'
cmd2 = ('less', '-FRX')
egrep = subprocess.Popen(command, shell=True, stdout=subprocess.PIPE)
subprocess.check_call(cmd2, stdin=egrep.stdout)
egrep.terminate()
By piping out first process into second process stdin, I am now able to terminate egrep immediately when I exit less and now my python script is flying :)
Cheers,
Milos
来源:https://stackoverflow.com/questions/30048985/subprocess-very-slow-when-calling-external-egrep-and-less