refactoring Ruby scraping code

北城以北 提交于 2019-12-14 04:01:17

问题


Basically, I will have multiple .main_entry blocks on each page and I will need to pull a couple of pieces of data from each. How can this be properly refactored into methods?

require 'open-uri'
require 'nokogiri'


url = #url
doc = Nokogiri::HTML(open(url))

doc.css(".main_entry").each do |item|
  artist = item.at_css(".list_artist").text
  title = item.at_css(".list_album").text
  puts "#{artist} - #{title}"
end

I have arrived at this mess below, which throws the undefined local variable or method 'release' error that seems to be related to methods being over-written. Could you explain to me what process the code below goes through, why it breaks down and what I should turn to for the fix? Should each .main_entry block be saved into some kind of a cache or an array first, before instantiating?

require 'open-uri'
require 'nokogiri'

class Scraper
  def initialize(url)
    @url = url
  end

  def release
    @release ||= doc.css(".main_entry") || []
  end

  release.each do |item|
    define_method(:artist) do
      @artist ||= item.at_css(".list_artist").text
    end

    define_method(:title) do
      @title ||= item.at_css(".list_album").text
    end
  end

  private
  attr_reader :url

  def doc
    @doc ||= Nokogiri::HTML(open(url))
  end
end

scraper = Scraper.new( #url

puts "#{scraper.artist} - #{scraper.title}"

回答1:


Here is my suggestion:

require 'open-uri'
require 'nokogiri'

class ScrapedRelease
  attr_reader :item

  def initialize(item)
    @item = item
  end

  def artist
    @artist ||= item.at_css(".list_artist").text
  end

  def title
    @title ||= item.at_css(".list_album").text
  end
end

class Scraper
  def initialize(url)
    @url = url
  end

  def releases
    @releases ||= (doc.css(".main_entry") || []).map { |item| ScrapedRelease.new(item) }
  end

  private
  attr_reader :url

  def doc
    @doc ||= Nokogiri::HTML(open(url))
  end
end

Then you can do:

Scraper.new(url).releases.each do |release|
  puts "#{release.artist} - #{release.title}"
end


来源:https://stackoverflow.com/questions/26266552/refactoring-ruby-scraping-code

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!