Is there a web crawler library available for PHP or Ruby? [closed]

元气小坏坏 提交于 2019-12-06 09:37:09

问题


Is there a web crawler library available for PHP or Ruby? a library that can do it depth first or breadth first... and handle the links even when href="../relative_path.html" and base url is used.


回答1:


Check this page out for a Ruby library: Ruby Mechanize

I'd like to mention that you would still be responsible for the way in which your crawler traverses sites.




回答2:


http://phpcrawl.cuab.de/




回答3:


you can go for webrat or watir in ruby, much easier than mechanize




回答4:


If you'd like to learn basic web crawler & search things, you can start look at "luna engine".




回答5:


If you need to scrape web pages that use javascript you can use Capybara with a driver which will spin up a real browser, such as poltergeist. Its usually used with a testing framework for acceptance testing, but can also be used outside a testing framework.



来源:https://stackoverflow.com/questions/855873/is-there-a-web-crawler-library-available-for-php-or-ruby

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!