nokogiri | 易学教程

Rails 3.0 Parsing XML and Inserting into Database

阅读更多关于 Rails 3.0 Parsing XML and Inserting into Database

问题 I'm new to Rails, and am trying to hook up my application to a third-party API (it does't have a gem or plugin for Rails). Ideally, what I want to be able to do is parse the data (I've heard good things about Nokogiri, but don't know how to use it for what I want to do do. not for lack of trying), and then insert it into the database. Could anybody provide instructions or point me in the right direction? Cheers. UPDATE: Rake Task: task :fetch_flyers => :environment do require 'nokogiri'

I can't remove whitespaces from a string parsed by Nokogiri

阅读更多关于 I can't remove whitespaces from a string parsed by Nokogiri

问题 I can't remove whitespaces from a string. My HTML is: <p class='your-price'> Cena pro Vás: <strong>139 <small>Kč</small></strong> </p> My code is: #encoding: utf-8 require 'rubygems' require 'mechanize' agent = Mechanize.new site = agent.get("http://www.astratex.cz/podlozky-pod-raminka/doplnky") price = site.search("//p[@class='your-price']/strong/text()") val = price.first.text => "139 " val.strip => "139 " val.gsub(" ", "") => "139 " gsub , strip , etc. don't work. Why, and how do I fix

How to get the raw HTML source code for a page by using Ruby or Nokogiri?

阅读更多关于 How to get the raw HTML source code for a page by using Ruby or Nokogiri?

I'm using Nokogiri (Ruby Xpath library) to grep contents on web pages. Then I found problems with some web pages, such as Ajax web pages, and that means when I view source code I won't be seeing the exact contents such as <table> , etc. How can I get the HTML code for the actual content? Don't use Nokogiri at all if you want the raw source of a web page. Just fetch the web page directly as a string, and then do not feed that to Nokogiri. For example: require 'open-uri' html = open('http://phrogz.net').read puts html.length #=> 8461 puts html #=> ...raw source of the page... If, on the other

How to add a new node to XML

阅读更多关于 How to add a new node to XML

问题 I have a simple XML file, items.xml: <?xml version="1.0" encoding="UTF-8" ?> <items> <item> <name>mouse</name> <manufacturer>Logicteh</manufacturer> </item> <item> <name>keyboard</name> <manufacturer>Logitech - Inc.</manufacturer> </item> <item> <name>webcam</name> <manufacturer>Logistech</manufacturer> </item> </items> I am trying to insert a new node with the following code: require 'rubygems' require 'nokogiri' f = File.open('items.xml') @items = Nokogiri::XML(f) f.close price = Nokogiri:

grabbing text between two elements in nokogiri?

阅读更多关于 grabbing text between two elements in nokogiri?

<body> <div>some text</div> I NEED THIS TEXT ONLY <div>some text</div> more text here <div>some text</div> one more text here <div>some text</div> </body> How? Use : /*/div[1]/following-sibling::text()[1] This selects the first text-node sibling of the first div child of the top element of the document. this returns the first text node within body between two div elements: /body/text()[ ./preceding::element()[1][local-name()="div"] and ./following::element()[1][local-name()="div"] ][1] should return I NEED THIS TEXT ONLY This XPath 1.0: /body/text()[preceding-sibling::*[1][self::div]]

How to click link in Mechanize and Nokogiri?

阅读更多关于 How to click link in Mechanize and Nokogiri?

I'm using Mechanize to scrape Google Wallet for Order data. I am capturing all the data from the first page, however, I need to automatically link to subsequent pages to get more info. The #purchaseOrderPager-pagerNextButton will move to the next page so I can pick up more records to capture. The element looks like this. I need to click on it to keep going. <a id="purchaseOrderPager-pagerNextButton" class="kd-button small right" href="purchaseorderlist?startTime=0&... ;currentPageStart=1&currentPageEnd=25&inputFullText="> <img src="https://www.gstatic.com/mc3/purchaseorder/page-right.png"></a>

Convert XML collection (of Pivotal Tracker stories) to Ruby hash/object

阅读更多关于 Convert XML collection (of Pivotal Tracker stories) to Ruby hash/object

问题 I have a collection of stories in an XML format. I would like to parse the file and return each story as either hash or Ruby object, so that I can further manipulate the data within a Ruby script. Does Nokogiri support this, or is there a better tool/library to use? The XML document has the following structure, returned via Pivotal Tracker's web API: <?xml version="1.0" encoding="UTF-8"?> <stories type="array" count="145" total="145"> <story> <id type="integer">16376</id> <story_type>feature<

XPath to find all following siblings up until the next sibling of a particular type

阅读更多关于 XPath to find all following siblings up until the next sibling of a particular type

问题 Given this XML/HTML: <dl> <dt>Label1</dt><dd>Value1</dd> <dt>Label2</dt><dd>Value2</dd> <dt>Label3</dt><dd>Value3a</dd><dd>Value3b</dd> <dt>Label4</dt><dd>Value4</dd> </dl> I want to find all <dt> and then, for each, find the following <dd> up until the next <dt> . Using Ruby's Nokogiri I am able to accomplish this like so: dl.xpath('dt').each do |dt| ct = dt.xpath('count(following-sibling::dt)') dds = dt.xpath("following-sibling::dd[count(following-sibling::dt)=#{ct}]") puts "#{dt.text}: #

I need to scrape data from a facebook game - using ruby

阅读更多关于 I need to scrape data from a facebook game - using ruby

Revised (clarified question) I've spent a few days already trying to figure out how to scrape specific information from a facebook game; however, I've run into brick wall after brick wall. As best as I can tell, the main problem is as follows. I can use Chrome's inspect element tool to manually find the html that I need - it appears nestled inside an iframe. However, when I try and scrape that iframe, it is empty (except for properties): <iframe id="game_frame" name="game_frame" src="" scrolling="no" ...></iframe> This is the same output that I see if I use a browsers "View page source" tool.

Building blank XML tags with Nokogiri?

阅读更多关于 Building blank XML tags with Nokogiri?

I'm trying to build up an XML document using Nokogiri. Everything is pretty standard so far; most of my code just looks something like: builder = Nokogiri::XML::Builder.new do |xml| ... xml.Tag1(object.attribute_1) xml.Tag2(object.attribute_2) xml.Tag3(object.attribute_3) xml.Tag4(nil) end builder.to_xml However, that results in a tag like <Tag4/> instead of <Tag4></Tag4> , which is what my end user has specified that the output needs to be. How do I tell Nokogiri to put full tags around a nil value? SaveOptions ::NO_EMPTY_TAGS will get you what you want. require 'nokogiri' builder = Nokogiri: