asynchronous slower than synchronous

醉酒当歌 提交于 2020-01-14 03:38:11

问题


My program does the following:

  • take folder of txt files
  • for each file:
    • read the file
    • do POST request to an API in localhost using file content
    • parse XML response (not in the example below)

I was concerned with performance of synchronous version of the program so tried to use aiohttp to make it asynchronous (it's my first attempt of async programming in Python besides Scrapy). It turned out that the async code took 2 times longer and I don't understand why.

SYNCHRONOUS CODE (152 seconds)

url = "http://localhost:6090/api/analyzexml"
package = #name of the package I send in each requests
with open("template.txt", "r", encoding="utf-8") as f:
    template = f.read()

articles_path = #location of my text files

def fetch(session, url, article_text):
    data = {"package": package, "data": template.format(article_text)}
    response = session.post(url, data=json.dumps(data))
    print(response.text)

files = glob(os.path.join(articles_path, "*.txt"))

with requests.Session() as s:
    for file in files:
        with open(file, "r", encoding="utf-8") as f:
                article_text = f.read()
        fetch(s, url, article_text)

Profiling results:

+--------+---------+----------+---------+----------+-------------------------------------------------------+
| ncalls | tottime | percall  | cumtime | percall  |               filename:lineno(function)               |
+--------+---------+----------+---------+----------+-------------------------------------------------------+
|    849 |   145.6 |   0.1715 |   145.6 |   0.1715 | ~:0(<method 'recv_into' of '_socket.socket' objects>) |
|      2 |   1.001 |   0.5007 |   1.001 |   0.5007 | ~:0(<method 'connect' of '_socket.socket' objects>)   |
|    365 |   0.772 | 0.002115 |   1.001 | 0.002742 | ~:0(<built-in method builtins.print>)                 |
+--------+---------+----------+---------+----------+-------------------------------------------------------+

(WANNABE) ASYNCHRONOUS CODE (327 seconds)

async def fetch(session, url, article_text):
    data = {"package": package, "data": template.format(article_text)}
    async with session.post(url, data=json.dumps(data)) as response:
        return await response.text()

async def process_files(articles_path):
    tasks = []

    async with ClientSession() as session:
        files = glob(os.path.join(articles_path, "*.txt"))
        for file in files:
            with open(file, "r", encoding="utf-8") as f:
                article_text = f.read()
            task = asyncio.ensure_future(fetch(session=session, 
                                        url=url, 
                                        article_text=article_text
                                        ))
            tasks.append(task)
            responses = await asyncio.gather(*tasks)
            print(responses)


loop = asyncio.get_event_loop()
future = asyncio.ensure_future(process_files(articles_path))
loop.run_until_complete(future)

Profiling results:

 +--------+---------+---------+---------+---------+-----------------------------------------------+
    | ncalls | tottime | percall | cumtime | percall |           filename:lineno(function)           |
    +--------+---------+---------+---------+---------+-----------------------------------------------+
    |   2278 |     156 | 0.06849 |     156 | 0.06849 | ~:0(<built-in method select.select>)          |
    |    365 |   128.3 |  0.3516 |   168.9 |  0.4626 | ~:0(<built-in method builtins.print>)         |
    |    730 |   40.54 | 0.05553 |   40.54 | 0.05553 | ~:0(<built-in method _codecs.charmap_encode>) |
    +--------+---------+---------+---------+---------+-----------------------------------------------+

I am clearly missing something in this concept. Could someone also help me understand why print in async version takes so much time (see profiling).


回答1:


Because it's not asynchronous :)

Look at your code: you do responses = await asyncio.gather(*tasks) for every file, so you basically run fetching in sync, every time paying all the price of coroutine handling.

I suppose it's just an indentation error; if you unindent responses = await asyncio.gather(*tasks) so that it's past the for file in files loop, you will really start tasks in parallel.



来源:https://stackoverflow.com/questions/50003803/asynchronous-slower-than-synchronous

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!