Screen scrape a web page that uses javaScript and frames

纵然是瞬间 提交于 2019-12-13 07:56:02

问题


I want to scrape data from www.marktplaats.nl . I want to analyze the scraped description, price, date and views in Excel/Access.

I tried to scrape data with Ruby (nokogiri, scrapi) but nothing worked. (on other sites it worked well) The main problem is that for example selectorgadget and the add-on firebug (Firefox) don’t find any css I can use to scrape the page. On other sites I can extract the css with selectorgadget or firebug and use it with nokogiri or scrapi. Due to lack of experience it is difficult to identify the problem and therefore searching for a solution isn’t easy.

Can you tell me where to start solving this problem and where I maybe can find more info about a similar scraping process?

Thanks in advance!


回答1:


I used excel web query and it works perfect. You can find a lot about scraping with excel on youtube if you search for mrexcel. Thanks, Mello




回答2:


You can try IRobotSoft web scraper. It has good frame support and is free.




回答3:


Iframes aren't a problem - just access the embedded iframe URL directly. You will find that it redirects in the browser unless you disable JavaScript.

Description and date can be extracted straight from HTML source. However prices are images which will make scraping them more cumbersome.



来源:https://stackoverflow.com/questions/2216826/screen-scrape-a-web-page-that-uses-javascript-and-frames

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!