Scraping/Parsing Google search results in Ruby

前端 未结 6 693
滥情空心
滥情空心 2021-01-01 05:37

Assume I have the entire HTML of a Google search results page. Does anyone know of any existing code (Ruby?) to scrape/parse the first page of Google search results? Ideal

6条回答
  •  没有蜡笔的小新
    2021-01-01 06:11

    This should be very simple thing, have a look at the "Screen Scraping with ScrAPI" screen cast by Ryan Bates. You still can do without scraping libraries, just stick to things like Nokogiri.


    From Nokogiri's documentation:

    require 'nokogiri'
    require 'open-uri'
    
    # Get a Nokogiri::HTML:Document for the page we’re interested in...
    
    doc = Nokogiri::HTML(open('http://www.google.com/search?q=tenderlove'))
    
    # Do funky things with it using Nokogiri::XML::Node methods...
    
    ####
    # Search for nodes by css
    doc.css('h3.r a.l').each do |link|
      puts link.content
    end
    
    ####
    # Search for nodes by xpath
    doc.xpath('//h3/a[@class="l"]').each do |link|
      puts link.content
    end
    
    ####
    # Or mix and match.
    doc.search('h3.r a.l', '//h3/a[@class="l"]').each do |link|
      puts link.content
    end
    

提交回复
热议问题