Subprocess very slow when calling external egrep and less

左心房为你撑大大i 提交于 2019-12-13 06:08:15

问题


I'm trying to build a python script that will allow me dynamic build up on egrep -v attributes and pipe the output into less (or more).
The reason why I want to use external egrep+less is because files that I am processing are very large text files (500MB+). Reading them first into a list and processing all natively through Python is very slow.

However, when I use os.system or subprocess.call, everything is very slow at the moment I want to exit less output and return back to python code.

My code should work like this:
1. ./myless.py messages_500MB.txt
2. Less -FRX output of messages_500MB.txt is shown (complete file).
3. When I press 'q' to exit less -FRX, python code should take over and display prompt for user to enter text to be excluded. User enters it and I add this to the list
4. My python code builds up egrep -v 'exclude1' and pipes the output to less
5. User repeats step 3 and enters another stuff to be excluded
6. Now my python code calls egrep -v 'exclude1|exclude2' messages_500MB.txt | less -FRX
7. And the process continues

However, this does not work as expected.
* On my Mac, when user press q to exit less -FRX, it takes few seconds for raw_input prompt to be displayed
* On Linux machine, I get loads of 'egrep: writing output: Broken pipe'
* If, (linux only) while in less -FRX, I press CTRL+C, exiting less -FRX for some reason becomes much much quicker (as intended). On Mac, my python program breaks

Here is sample of my code:

excluded = list()
myInput = ''
while myInput != 'q':
    grepText = '|'.join(excluded)
    if grepText == '':
        command = 'egrep "" ' + file + ' | less -FRX'
    else:
        command = 'egrep -v "' + grepText + '" ' + file + ' | less -FRX'

    subprocess.call(command, shell=True)
    myInput = raw_input('Enter text to exclude, q to exit, # to see what is excluded: ')
    excluded.append(myInput)

Any help would be much appreciated


回答1:


Actually I figured out what the problem is

I did some research on error that is visible when running my script on Linux ("egrep: writing output: Broken pipe") and that lead me to the answer:
Issue is when I use egrep -v 'xyz' file | less, when I quit less, subprocess still continues to run egrep and on large files (500MB+) this takes a while.

Aparently, subprocess takes two programs separately and runs the first one (egrep) even after the second one (less) exited

To properly resolve my issue, I use something like this:

command = 'egrep -v "something" <filename>'
cmd2 = ('less', '-FRX') 
egrep = subprocess.Popen(command, shell=True, stdout=subprocess.PIPE)
subprocess.check_call(cmd2, stdin=egrep.stdout)
egrep.terminate()

By piping out first process into second process stdin, I am now able to terminate egrep immediately when I exit less and now my python script is flying :)

Cheers,
Milos



来源:https://stackoverflow.com/questions/30048985/subprocess-very-slow-when-calling-external-egrep-and-less

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!