Web page scraping gems/tools available in Ruby [closed]

故事扮演 提交于 2019-12-01 00:33:26

问题


I'm trying to scrape web pages in a Ruby script that I'm working on. The purpose of the project is to show which ETFs and stock mutual funds are most compatible with the value investing philosophy.

Some examples of pages I'd like to scrape are:

http://finance.yahoo.com/q/pr?s=SPY+Profile
http://finance.yahoo.com/q/hl?s=SPY+Holdings
http://www.marketwatch.com/tools/mutual-fund/list/V

What web scraping tools do you recommend for Ruby, and why? Keep in mind that there are thousands of stock funds out there, so any tool I use has to be reasonably quick.

I am new to Ruby, but I have experience using lxml to scrape web pages in Python (https://github.com/jhsu802701/dopplervalueinvesting/blob/master/screen.py). Once the pages on 5000+ stocks are downloaded, lxml can scrape them all in just a few minutes. (I remember trying BeautifulSoup but rejecting it because it was too slow.)


回答1:


There are so many scraping gems available in Ruby like Hpricot, Nokogiri and so many. I recommend Nokogiri to scrape static web pages. If you are scraping dynamic web pages (means which involves button click, submit form etc..). I recommend Mechanize which internally uses Nokogiri.




回答2:


I see a list of HTML parsing solutions at https://www.ruby-toolbox.com/categories/html_parsing.html . I'm going with Nokogiri because it's the only one that's still active.



来源:https://stackoverflow.com/questions/15037392/web-page-scraping-gems-tools-available-in-ruby

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!