问题
Refering to the one of my previous question, I have to scrape reviews(all reviews) of a hotel, for example this hotel
With using BeautifulSoap, what I have done that I first get all the review pages links from pagination within the div having class BVRRPager BVRRPageBasedPager, and then scrape reviews from all pages.
Problem with BeautifulSoap is that the content in div.BVRRRatingSummary does not come along(try loaing that page with JS disabled)
I have scraped the reviews using Selinium but my client does not want to use Selinium because it loads full page with JS and images
I want to know that what kind of process they might be using to load review? And is there any way I can scrape the content in div.BVRRRatingSummary with BeautifulSoap?
回答1:
You could try using firefox with the firebug addon. Open up firebug when loading the webpage and go to Net and then click on XHR. That will show you which json files are being loaded. You can then try to get those files directly and work with those using a library like simplejson.
来源:https://stackoverflow.com/questions/27176391/trouble-in-scraping-from-a-page