I am using Python 2 subprocess with threading threads to take standard input, process it with binaries A, B, and C<
Since you talked about popen() and pthreads in comments, I guess you are under a POSIX system (maybe Linux).
So did you try to use subprocess32 instead of the standard subprocess library.
Its use is strongly encouraged by the documentation and may lead to some improvment.
PS: I believe mixing forks (subprocess) and threads is a bad idea.
PS2: Why python produceA.py | A | python produceB.py | B | python produceC.py | C does not fit your needs ? Or its equivalent using subprocess ?