Python Multiprocessing: Broken Pipe exception after increasing Pool size

余生长醉 提交于 2021-02-10 17:27:10


The exception I get. All I did that I increased pool count


 def parse(url):
  r = request.get(url)
with Pool(POOL_COUNT) as p:
    result =, links)

File "/usr/lib64/python3.5/multiprocessing/", line 130, in worker
    put((job, i, (False, wrapped)))
  File "/usr/lib64/python3.5/multiprocessing/", line 355, in put
  File "/usr/lib64/python3.5/multiprocessing/", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/usr/lib64/python3.5/multiprocessing/", line 404, in _send_bytes
    self._send(header + buf)
  File "/usr/lib64/python3.5/multiprocessing/", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
Process ForkPoolWorker-26:
Traceback (most recent call last):
  File "/usr/lib64/python3.5/multiprocessing/", line 125, in worker
    put((job, i, result))
  File "/usr/lib64/python3.5/multiprocessing/", line 355, in put
  File "/usr/lib64/python3.5/multiprocessing/", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/usr/lib64/python3.5/multiprocessing/", line 404, in _send_bytes
    self._send(header + buf)
  File "/usr/lib64/python3.5/multiprocessing/", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe


I was seeing Broken Pipe exception too. But mine is more complicated.

One reason that increasing the pool size alone will lead to exception would be you're getting too many things in request module so it could leads to not enough memory. Then it will seg-fault especially you have a small swap.

Edit1: I believe it's caused by memory usage. Too many pool connections used up too many memory and it finally get broken. It's very hard to debug and I myself limited my pool size to 4 since I have a small RAM and big packages.


This simple version of you code works perfect here with any number of POOL_COUNT

from multiprocessing import Pool
def parse(url):
  r = url

with Pool(processes=POOL_COUNT) as p:
    links = [str(i) for i in range(POOL_COUNT)]
    result =, links)

Doesn't it? So the problem should be in request part, maybe needs a sleep?


I tried to reproduce on a AWS t2.small instance (2GB RAM as you described) with the following script (note that you missed a s in requests.get(), assuming you are using the requests library, and also the return was missing):

from multiprocessing import Pool
import requests
def parse(url):
  a = requests.get(url)
  if a.status_code != 200:
  return a.text
links = ['' for i in range(1000)]
with Pool(POOL_COUNT) as p:
  result =, links)

Sadly, I didn't run into the same issue as you did.

From the stack trace you posted it seems that the problem is in launching the parse function, not in the requests module itself. It looks like the main process cannot send data to one of the launched processes.

Anyway: This operation is not CPU bound, the bottleneck is the network (most probably the remote servers max connections, or also probably), you are much better off using multithreading. This is most probably also faster, because needs to communicate between the processes, that means that the return of parse needs to be pickled and then sent to the main process.

To try with threads instead of processes, simply do from multiprocessing.pool import ThreadPool and replace Pool with ThreadPool in your code.

