So I\'m using python and beautifulsoup4(which i\'m not tied to) to scrape a website. Problem is when I use urlib to grab the html of a page it\'s not the entire page because
There are basically two main options to proceed with:
The first option is more difficult to implement and it's, generally speaking, more fragile, but it doesn't require a real browser and can be faster.
The second option is better in terms of you get what any other real user gets and you wouldn't be worried about how the page was loaded. Selenium is pretty powerful in locating elements on a page - you may not need BeautifulSoup at all. But, anyway, this option is slower than the first one.
Hope that helps.