asyncio web scraping 101: fetching multiple urls with aiohttp

前端 未结 2 1095
故里飘歌
故里飘歌 2020-12-08 00:59

In earlier question, one of authors of aiohttp kindly suggested way to fetch multiple urls with aiohttp using the new async with syntax from

2条回答
  •  谎友^
    谎友^ (楼主)
    2020-12-08 01:27

    I am far from an asyncio expert but you want to catch the error you need to catch a socket error:

    async def fetch(session, url):
        with aiohttp.Timeout(10):
            try:
                async with session.get(url) as response:
                    print(response.status == 200)
                    return await response.text()
            except socket.error as e:
                print(e.strerror)
    

    Running the code and printing the_results:

    Cannot connect to host sdfkhskhgklhskljhgsdfksjh.com:80 ssl:False [Can not connect to sdfkhskhgklhskljhgsdfksjh.com:80 [Name or service not known]]
    True
    True
    ({:5> result='\n\n'>, :5> result=None>, :5> result=''>}, set())
    

    You can see we get catch the error and the further calls are still successful returning the html.

    We should probably really be catching an OSError as socket.error is A deprecated alias of OSError since python 3.3:

    async def fetch(session, url):
        with aiohttp.Timeout(10):
            try:
                async with session.get(url) as response:
                    return await response.text()
            except OSError as e:
                print(e)
    

    If you want to also check the response is 200, put your if in the try too and you can use the reason attribute to get more info:

    async def fetch(session, url):
        with aiohttp.Timeout(10):
            try:
                async with session.get(url) as response:
                    if response.status != 200:
                        print(response.reason)
                    return await response.text()
            except OSError as e:
                print(e.strerror)
    

提交回复
热议问题