How to speed up communication with subprocesses

后端 未结 5 857
佛祖请我去吃肉
佛祖请我去吃肉 2021-01-02 02:11

I am using Python 2 subprocess with threading threads to take standard input, process it with binaries A, B, and C<

5条回答
  •  无人及你
    2021-01-02 02:40

    I think you are just being mislead by the way cProfile works. For example, here's a simple script that uses two threads:

    #!/usr/bin/python
    
    import threading
    import time
    
    def f():
        time.sleep(10)
    
    
    def main():
        t = threading.Thread(target=f)
        t.start()
        t.join()
    

    If I test this using cProfile, here's what I get:

    >>> import test
    >>> import cProfile
    >>> cProfile.run('test.main()')
             60 function calls in 10.011 seconds
    
       Ordered by: standard name
    
       ncalls  tottime  percall  cumtime  percall filename:lineno(function)
            1    0.000    0.000   10.011   10.011 :1()
            1    0.000    0.000   10.011   10.011 test.py:10(main)
            1    0.000    0.000    0.000    0.000 threading.py:1008(daemon)
            2    0.000    0.000    0.000    0.000 threading.py:1152(currentThread)
            2    0.000    0.000    0.000    0.000 threading.py:241(Condition)
            2    0.000    0.000    0.000    0.000 threading.py:259(__init__)
            2    0.000    0.000    0.000    0.000 threading.py:293(_release_save)
            2    0.000    0.000    0.000    0.000 threading.py:296(_acquire_restore)
            2    0.000    0.000    0.000    0.000 threading.py:299(_is_owned)
            2    0.000    0.000   10.011    5.005 threading.py:308(wait)
            1    0.000    0.000    0.000    0.000 threading.py:541(Event)
            1    0.000    0.000    0.000    0.000 threading.py:560(__init__)
            2    0.000    0.000    0.000    0.000 threading.py:569(isSet)
            4    0.000    0.000    0.000    0.000 threading.py:58(__init__)
            1    0.000    0.000    0.000    0.000 threading.py:602(wait)
            1    0.000    0.000    0.000    0.000 threading.py:627(_newname)
            5    0.000    0.000    0.000    0.000 threading.py:63(_note)
            1    0.000    0.000    0.000    0.000 threading.py:656(__init__)
            1    0.000    0.000    0.000    0.000 threading.py:709(_set_daemon)
            1    0.000    0.000    0.000    0.000 threading.py:726(start)
            1    0.000    0.000   10.010   10.010 threading.py:911(join)
           10   10.010    1.001   10.010    1.001 {method 'acquire' of 'thread.lock' objects}
            2    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
            1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
            4    0.000    0.000    0.000    0.000 {method 'release' of 'thread.lock' objects}
            4    0.000    0.000    0.000    0.000 {thread.allocate_lock}
            2    0.000    0.000    0.000    0.000 {thread.get_ident}
            1    0.000    0.000    0.000    0.000 {thread.start_new_thread}
    

    As you can see, it says that almost all of the time is spent acquiring locks. Of course, we know that's not really an accurate representation of what the script was doing. All the time was actually spent in a time.sleep call inside f(). The high tottime of the acquire call is just because join was waiting for f to finish, which means it had to sit and wait to acquire a lock. However, cProfile doesn't show any time being spent in f at all. We can clearly see what is actually happening because the example code is so simple, but in a more complicated program, this output is very misleading.

    You can get more reliable results by using another profiling library, like yappi:

    >>> import test
    >>> import yappi
    >>> yappi.set_clock_type("wall")
    >>> yappi.start()
    >>> test.main()
    >>> yappi.get_func_stats().print_all()
    
    Clock type: wall
    Ordered by: totaltime, desc
    
    name                                    #n         tsub      ttot      tavg
    :1                       2/1        0.000025  10.00801  5.004003
    test.py:10 main                         1          0.000060  10.00798  10.00798
    ..2.7/threading.py:308 _Condition.wait  2          0.000188  10.00746  5.003731
    ..thon2.7/threading.py:911 Thread.join  1          0.000039  10.00706  10.00706
    ..ython2.7/threading.py:752 Thread.run  1          0.000024  10.00682  10.00682
    test.py:6 f                             1          0.000013  10.00680  10.00680
    ..hon2.7/threading.py:726 Thread.start  1          0.000045  0.000608  0.000608
    ..thon2.7/threading.py:602 _Event.wait  1          0.000029  0.000484  0.000484
    ..2.7/threading.py:656 Thread.__init__  1          0.000064  0.000250  0.000250
    ..on2.7/threading.py:866 Thread.__stop  1          0.000025  0.000121  0.000121
    ..lib/python2.7/threading.py:541 Event  1          0.000011  0.000101  0.000101
    ..python2.7/threading.py:241 Condition  2          0.000025  0.000094  0.000047
    ..hreading.py:399 _Condition.notifyAll  1          0.000020  0.000090  0.000090
    ..2.7/threading.py:560 _Event.__init__  1          0.000018  0.000090  0.000090
    ..thon2.7/encodings/utf_8.py:15 decode  2          0.000031  0.000071  0.000035
    ..threading.py:259 _Condition.__init__  2          0.000064  0.000069  0.000034
    ..7/threading.py:372 _Condition.notify  1          0.000034  0.000068  0.000068
    ..hreading.py:299 _Condition._is_owned  3          0.000017  0.000040  0.000013
    ../threading.py:709 Thread._set_daemon  1          0.000018  0.000035  0.000035
    ..ding.py:293 _Condition._release_save  2          0.000019  0.000033  0.000016
    ..thon2.7/threading.py:63 Thread._note  7          0.000020  0.000020  0.000003
    ..n2.7/threading.py:1152 currentThread  2          0.000015  0.000019  0.000009
    ..g.py:296 _Condition._acquire_restore  2          0.000011  0.000017  0.000008
    ../python2.7/threading.py:627 _newname  1          0.000014  0.000014  0.000014
    ..n2.7/threading.py:58 Thread.__init__  4          0.000013  0.000013  0.000003
    ..threading.py:1008 _MainThread.daemon  1          0.000004  0.000004  0.000004
    ..hon2.7/threading.py:569 _Event.isSet  2          0.000003  0.000003  0.000002
    

    With yappi, it's much easier to see that the time is being spent in f.

    I suspect that you'll find that in reality, most of your script's time is spent doing whatever work is being done in produceA, produceB, and produceC.

提交回复
热议问题