web crawling tools which support interacting with target sites before begining to crawl

[亡魂溺海] 提交于 2019-12-25 05:38:28

问题


I am looking for a crawler which is capable of handling pages with Ajax and being able to perform certain user interactions with the target site before starting to crawl the site (e.g., clicking on certain menu items, filling some forms, etc...).I tried webdriver/selenium (which are really web scraping tools) and now I am want to know if there is any crawler available that supports emulating certain user interactions before starting to crawl ? (In Java or Python or Ruby ...)

Thanks

ps - Can nutch do this ? If yes, I appreciate any link describing this.


回答1:


Nutch does not handle AJAX, cookies or any of the user interactions that you described.




回答2:


You could try hooking up selenium to a python based crawler like scrapy . Whenever AJAX needs to be handled, it'll fire up an external process for scraping with selenium.



来源:https://stackoverflow.com/questions/6507040/web-crawling-tools-which-support-interacting-with-target-sites-before-begining-t

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!