How do connections recycle in a multiprocess pool serving requests from a single requests.Session object in python?

Deadly 提交于 2021-01-01 04:17:25

问题


Below is the complete code simplified for the question.

ids_to_check returns a list of ids. For my testing, I used a list of 13 random strings.

#!/usr/bin/env python3
import time
from multiprocessing.dummy import Pool as ThreadPool, current_process as threadpool_process
import requests

def ids_to_check():
     some_calls()
     return(id_list)

def execute_task(id):
     url = f"https://myserver.com/todos/{ id }"
     json_op = s.get(url,verify=False).json()
     value = json_op['id']
     print(str(value) + '-' + str(threadpool_process()) + str(id(s)))

def main():
    pool = ThreadPool(processes=20)
    while True:
        pool.map(execute_task, ids_to_check())
        print("Let's wait for 10 seconds")
        time.sleep(10)

if __name__ == "__main__":
    s = requests.Session()
    s.headers.update = {
      'Accept': 'application/json'
    }

    main()

Output:

4-<DummyProcess(Thread-2, started daemon 140209222559488)>140209446508360
5-<DummyProcess(Thread-5, started daemon 140209123481344)>140209446508360
7-<DummyProcess(Thread-6, started daemon 140209115088640)>140209446508360
2-<DummyProcess(Thread-11, started daemon 140208527894272)>140209446508360
None-<DummyProcess(Thread-1, started daemon 140209230952192)>140209446508360
10-<DummyProcess(Thread-4, started daemon 140209131874048)>140209446508360
12-<DummyProcess(Thread-7, started daemon 140209106695936)>140209446508360
8-<DummyProcess(Thread-3, started daemon 140209140266752)>140209446508360
6-<DummyProcess(Thread-12, started daemon 140208519501568)>140209446508360
3-<DummyProcess(Thread-13, started daemon 140208511108864)>140209446508360
11-<DummyProcess(Thread-10, started daemon 140208536286976)>140209446508360
9-<DummyProcess(Thread-9, started daemon 140209089910528)>140209446508360
1-<DummyProcess(Thread-8, started daemon 140209098303232)>140209446508360
Let's wait for 10 seconds
None-<DummyProcess(Thread-14, started daemon 140208502716160)>140209446508360
3-<DummyProcess(Thread-20, started daemon 140208108455680)>140209446508360
1-<DummyProcess(Thread-19, started daemon 140208116848384)>140209446508360
7-<DummyProcess(Thread-17, started daemon 140208133633792)>140209446508360
6-<DummyProcess(Thread-6, started daemon 140209115088640)>140209446508360
4-<DummyProcess(Thread-4, started daemon 140209131874048)>140209446508360
9-<DummyProcess(Thread-16, started daemon 140208485930752)>140209446508360
5-<DummyProcess(Thread-15, started daemon 140208494323456)>140209446508360
2-<DummyProcess(Thread-2, started daemon 140209222559488)>140209446508360
8-<DummyProcess(Thread-18, started daemon 140208125241088)>140209446508360
11-<DummyProcess(Thread-1, started daemon 140209230952192)>140209446508360
10-<DummyProcess(Thread-11, started daemon 140208527894272)>140209446508360
12-<DummyProcess(Thread-5, started daemon 140209123481344)>140209446508360
Let's wait for 10 seconds
None-<DummyProcess(Thread-3, started daemon 140209140266752)>140209446508360
2-<DummyProcess(Thread-10, started daemon 140208536286976)>140209446508360
1-<DummyProcess(Thread-12, started daemon 140208519501568)>140209446508360
4-<DummyProcess(Thread-9, started daemon 140209089910528)>140209446508360
5-<DummyProcess(Thread-14, started daemon 140208502716160)>140209446508360
9-<DummyProcess(Thread-6, started daemon 140209115088640)>140209446508360
8-<DummyProcess(Thread-16, started daemon 140208485930752)>140209446508360
7-<DummyProcess(Thread-4, started daemon 140209131874048)>140209446508360
3-<DummyProcess(Thread-20, started daemon 140208108455680)>140209446508360
6-<DummyProcess(Thread-8, started daemon 140209098303232)>140209446508360
12-<DummyProcess(Thread-13, started daemon 140208511108864)>140209446508360
10-<DummyProcess(Thread-7, started daemon 140209106695936)>140209446508360
11-<DummyProcess(Thread-19, started daemon 140208116848384)>140209446508360
Let's wait for 10 seconds
.
.

My observation:

  • multiple connections are created (i.e., connection per process), but session object is same throughtout the execution of the code (as session object id is same)
  • connections keep recycling as seen from ss output. I couldn't identify any certain pattern/timeout for the recycling
  • connections are not recycling if I reduce the processes to a smaller number. (Example: 5)

I do not understand how/why the connections are being recycled and why they are not if I reduce the process count. I have tried disabling the garbage collector import gc; gc.disable() and still connections are recycled.

I would like the created connections to keep alive, until it reaches a maximum number of requests. I think it would work without sessions and using keep-alive connection header.

But I am curious to know what causing these sessions connections to keep recycling when a process pool length is high.

I can reproduce this issue with any server, so it may not be dependent on server.


回答1:


I solved the same issue for myself by creating session for each process and parallelized requests executions. And at first time I used multiprocessing.dummy too, but I faced the same issue as yours and changed it to concurrent.futures.thread.ThreadPoolExecutor.

Here is my solution.

from concurrent.futures.thread import ThreadPoolExecutor
from functools import partial

from requests import Session, Response
from requests.adapters import HTTPAdapter

def thread_pool_execute(iterables, method, pool_size=30) -> list:
    """Multiprocess requests, returns list of responses."""
    session = Session()
    session.mount('https://', HTTPAdapter(pool_maxsize=pool_size))  # that's it
    session.mount('http://', HTTPAdapter(pool_maxsize=pool_size))  # that's it    
    worker = partial(method, session)
    with ThreadPoolExecutor(pool_size) as pool:
        results = pool.map(worker, iterables)
    session.close()
    return list(results)

def simple_request(session, url) -> Response:
    return session.get(url)

response_list = thread_pool_execute(list_of_urls, simple_request)

I test sitemaps with 200k urls with it with pool_size=150 without any problems. It's restricts only by target host configuration.



来源:https://stackoverflow.com/questions/65365783/how-do-connections-recycle-in-a-multiprocess-pool-serving-requests-from-a-single

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!