问题
Basically, I will have multiple .main_entry
blocks on each page and I will need to pull a couple of pieces of data from each. How can this be properly refactored into methods?
require 'open-uri'
require 'nokogiri'
url = #url
doc = Nokogiri::HTML(open(url))
doc.css(".main_entry").each do |item|
artist = item.at_css(".list_artist").text
title = item.at_css(".list_album").text
puts "#{artist} - #{title}"
end
I have arrived at this mess below, which throws the undefined local variable or method 'release'
error that seems to be related to methods being over-written. Could you explain to me what process the code below goes through, why it breaks down and what I should turn to for the fix? Should each .main_entry
block be saved into some kind of a cache or an array first, before instantiating?
require 'open-uri'
require 'nokogiri'
class Scraper
def initialize(url)
@url = url
end
def release
@release ||= doc.css(".main_entry") || []
end
release.each do |item|
define_method(:artist) do
@artist ||= item.at_css(".list_artist").text
end
define_method(:title) do
@title ||= item.at_css(".list_album").text
end
end
private
attr_reader :url
def doc
@doc ||= Nokogiri::HTML(open(url))
end
end
scraper = Scraper.new( #url
puts "#{scraper.artist} - #{scraper.title}"
回答1:
Here is my suggestion:
require 'open-uri'
require 'nokogiri'
class ScrapedRelease
attr_reader :item
def initialize(item)
@item = item
end
def artist
@artist ||= item.at_css(".list_artist").text
end
def title
@title ||= item.at_css(".list_album").text
end
end
class Scraper
def initialize(url)
@url = url
end
def releases
@releases ||= (doc.css(".main_entry") || []).map { |item| ScrapedRelease.new(item) }
end
private
attr_reader :url
def doc
@doc ||= Nokogiri::HTML(open(url))
end
end
Then you can do:
Scraper.new(url).releases.each do |release|
puts "#{release.artist} - #{release.title}"
end
来源:https://stackoverflow.com/questions/26266552/refactoring-ruby-scraping-code