Trouble in scraping from a page

本小妞迷上赌 提交于 2020-01-05 12:16:55

问题


Refering to the one of my previous question, I have to scrape reviews(all reviews) of a hotel, for example this hotel

With using BeautifulSoap, what I have done that I first get all the review pages links from pagination within the div having class BVRRPager BVRRPageBasedPager, and then scrape reviews from all pages. Problem with BeautifulSoap is that the content in div.BVRRRatingSummary does not come along(try loaing that page with JS disabled)

I have scraped the reviews using Selinium but my client does not want to use Selinium because it loads full page with JS and images

I want to know that what kind of process they might be using to load review? And is there any way I can scrape the content in div.BVRRRatingSummary with BeautifulSoap?


回答1:


You could try using firefox with the firebug addon. Open up firebug when loading the webpage and go to Net and then click on XHR. That will show you which json files are being loaded. You can then try to get those files directly and work with those using a library like simplejson.



来源:https://stackoverflow.com/questions/27176391/trouble-in-scraping-from-a-page

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!