I\'m running a scraper of this course website and I\'m wondering whether there\'s a faster way to scrape the page once I have it put into beautifulsoup. It takes way longer
I'm gonna post this hidden gem in hopes that it might help someone as it helped me a lot:
Just make sure you're passing string object to BeautifulSoup and not bytes.
If you're using requests, do this
page = requests.get(some_url)
soup = BeautifulSoup(page.text, 'html.parser')
instead of this
page = requests.get(some_url)
soup = BeautifulSoup(page.content, 'html.parser')
I don't know the reason behind this, author of referenced article doesn't either, but it sure made my code almost 4 times faster.
Speeding Up BeautifulSoup With Large XML Files, James Hodgkinson