(1) whether the "second way" will be slower than "first way"
Starting a new process is an expensive operation therefore there should not be a large difference between allowing the shell to parse the command line and start child processes and doing it yourself in Python. The only benchmark that matters is your code on your hardware. Measure it.
(2) if I have to write in "first way" anyway (because it's faster to write), how can I avoid the complain like broken pipe
The first "broken pipe" might be similar to: 'yes' reporting error with subprocess communicate(). Try the workaround I've provided there.
The second broken pipe you could fix by redirecting the pipeline stdout to the mid
file:
with open(mid, 'wb') as file:
check_call(pipeline, shell=True, stdout=file)
It implements > {2}
in your command without the shell.
(3) what might be the most compelling reason that I shouldn't write in "first way"
if any of top_count
, extend
, mid
, summit
come from a source that is not completely under your control then you risk running an arbitrary command under your user.
plumbum
module provides both security and readability (measure time performance if it is important for you in this case):
from plumbum.cmd import awk, head, sort
awk_cmd = 'OFS="\t"{if($2-%s>0){print $1,$2-%s,$3+%s,$4,$5}}' % (extend/2,)*3
(sort["-n", "-r", "-k5", summit] | head["-n", "500"] | awk[awk_cmd] > mid)()
See, How do I use subprocess.Popen to connect multiple processes by pipes?