问题
I have four queues that each have multiple processes/threads that are interdependent in the following way:
- Queue 1 is reading a file from disk and copying to RAM
- Queue 2 takes the file in RAM and performs an operation on it
- Queue 3 takes the result of Queue 2 and performs a separate operation on it
- Queue 4 writes the end result back to disk
I would like these 4 queues to operate in parallel as much as possible with the caveat that Queue 2 has to wait for Queue 1 to place at least one process/thread on it (and similarly queue 2 has to place items on queue 3, and queue 3 on 4).
What is the best way in Python to go about implementing this (both for the queue and for the thread/process implementation)?
Will queue 2 and queue 3 block each other due to GIL if I use threads? I read that I/O and compute can still happen in parallel so I am ok even if Queue 1/2/4 can work in parallel, and queue 3 is sequential with queue 2.
回答1:
Is there any particular reason you actually need each of those 4 steps be separate threads/processes? Personally I'd just implement all 4 steps in one function/callable class, and then use multiprocessing.Pool's map to invoke the function in parallel over the filenames of interest.
Simpler example of this sort of pattern (just reading and processing) discussed in this Q&A. As the answer notes, if it appears to bottleneck on I/O rather than processing, just create more processes in the pool.
来源:https://stackoverflow.com/questions/27515167/python-interdependent-process-thread-queues