Speeding up beautifulsoup

前端 未结 3 568
粉色の甜心
粉色の甜心 2020-12-03 04:16

I\'m running a scraper of this course website and I\'m wondering whether there\'s a faster way to scrape the page once I have it put into beautifulsoup. It takes way longer

3条回答
  •  陌清茗
    陌清茗 (楼主)
    2020-12-03 04:18

    I'm gonna post this hidden gem in hopes that it might help someone as it helped me a lot:

    Just make sure you're passing string object to BeautifulSoup and not bytes.

    If you're using requests, do this

    page = requests.get(some_url)
    soup = BeautifulSoup(page.text, 'html.parser')
    

    instead of this

    page = requests.get(some_url)
    soup = BeautifulSoup(page.content, 'html.parser')
    

    I don't know the reason behind this, author of referenced article doesn't either, but it sure made my code almost 4 times faster.

    Speeding Up BeautifulSoup With Large XML Files, James Hodgkinson

提交回复
热议问题