nokogiri

Rails 3.0 Parsing XML and Inserting into Database

北战南征 提交于 2019-12-04 14:40:46
问题 I'm new to Rails, and am trying to hook up my application to a third-party API (it does't have a gem or plugin for Rails). Ideally, what I want to be able to do is parse the data (I've heard good things about Nokogiri, but don't know how to use it for what I want to do do. not for lack of trying), and then insert it into the database. Could anybody provide instructions or point me in the right direction? Cheers. UPDATE: Rake Task: task :fetch_flyers => :environment do require 'nokogiri'

I can't remove whitespaces from a string parsed by Nokogiri

心已入冬 提交于 2019-12-04 13:13:26
问题 I can't remove whitespaces from a string. My HTML is: <p class='your-price'> Cena pro Vás: <strong>139 <small>Kč</small></strong> </p> My code is: #encoding: utf-8 require 'rubygems' require 'mechanize' agent = Mechanize.new site = agent.get("http://www.astratex.cz/podlozky-pod-raminka/doplnky") price = site.search("//p[@class='your-price']/strong/text()") val = price.first.text => "139 " val.strip => "139 " val.gsub(" ", "") => "139 " gsub , strip , etc. don't work. Why, and how do I fix

How to get the raw HTML source code for a page by using Ruby or Nokogiri?

蓝咒 提交于 2019-12-04 12:53:33
I'm using Nokogiri (Ruby Xpath library) to grep contents on web pages. Then I found problems with some web pages, such as Ajax web pages, and that means when I view source code I won't be seeing the exact contents such as <table> , etc. How can I get the HTML code for the actual content? Don't use Nokogiri at all if you want the raw source of a web page. Just fetch the web page directly as a string, and then do not feed that to Nokogiri. For example: require 'open-uri' html = open('http://phrogz.net').read puts html.length #=> 8461 puts html #=> ...raw source of the page... If, on the other

How to add a new node to XML

℡╲_俬逩灬. 提交于 2019-12-04 11:41:28
问题 I have a simple XML file, items.xml: <?xml version="1.0" encoding="UTF-8" ?> <items> <item> <name>mouse</name> <manufacturer>Logicteh</manufacturer> </item> <item> <name>keyboard</name> <manufacturer>Logitech - Inc.</manufacturer> </item> <item> <name>webcam</name> <manufacturer>Logistech</manufacturer> </item> </items> I am trying to insert a new node with the following code: require 'rubygems' require 'nokogiri' f = File.open('items.xml') @items = Nokogiri::XML(f) f.close price = Nokogiri:

grabbing text between two elements in nokogiri?

扶醉桌前 提交于 2019-12-04 11:25:40
<body> <div>some text</div> I NEED THIS TEXT ONLY <div>some text</div> more text here <div>some text</div> one more text here <div>some text</div> </body> How? Use : /*/div[1]/following-sibling::text()[1] This selects the first text-node sibling of the first div child of the top element of the document. this returns the first text node within body between two div elements: /body/text()[ ./preceding::element()[1][local-name()="div"] and ./following::element()[1][local-name()="div"] ][1] should return I NEED THIS TEXT ONLY This XPath 1.0: /body/text()[preceding-sibling::*[1][self::div]]

How to click link in Mechanize and Nokogiri?

老子叫甜甜 提交于 2019-12-04 10:15:37
I'm using Mechanize to scrape Google Wallet for Order data. I am capturing all the data from the first page, however, I need to automatically link to subsequent pages to get more info. The #purchaseOrderPager-pagerNextButton will move to the next page so I can pick up more records to capture. The element looks like this. I need to click on it to keep going. <a id="purchaseOrderPager-pagerNextButton" class="kd-button small right" href="purchaseorderlist?startTime=0&... ;currentPageStart=1&currentPageEnd=25&inputFullText="> <img src="https://www.gstatic.com/mc3/purchaseorder/page-right.png"></a>

Convert XML collection (of Pivotal Tracker stories) to Ruby hash/object

杀马特。学长 韩版系。学妹 提交于 2019-12-04 09:47:45
问题 I have a collection of stories in an XML format. I would like to parse the file and return each story as either hash or Ruby object, so that I can further manipulate the data within a Ruby script. Does Nokogiri support this, or is there a better tool/library to use? The XML document has the following structure, returned via Pivotal Tracker's web API: <?xml version="1.0" encoding="UTF-8"?> <stories type="array" count="145" total="145"> <story> <id type="integer">16376</id> <story_type>feature<

XPath to find all following siblings up until the next sibling of a particular type

旧巷老猫 提交于 2019-12-04 09:44:03
问题 Given this XML/HTML: <dl> <dt>Label1</dt><dd>Value1</dd> <dt>Label2</dt><dd>Value2</dd> <dt>Label3</dt><dd>Value3a</dd><dd>Value3b</dd> <dt>Label4</dt><dd>Value4</dd> </dl> I want to find all <dt> and then, for each, find the following <dd> up until the next <dt> . Using Ruby's Nokogiri I am able to accomplish this like so: dl.xpath('dt').each do |dt| ct = dt.xpath('count(following-sibling::dt)') dds = dt.xpath("following-sibling::dd[count(following-sibling::dt)=#{ct}]") puts "#{dt.text}: #

I need to scrape data from a facebook game - using ruby

China☆狼群 提交于 2019-12-04 08:35:05
Revised (clarified question) I've spent a few days already trying to figure out how to scrape specific information from a facebook game; however, I've run into brick wall after brick wall. As best as I can tell, the main problem is as follows. I can use Chrome's inspect element tool to manually find the html that I need - it appears nestled inside an iframe. However, when I try and scrape that iframe, it is empty (except for properties): <iframe id="game_frame" name="game_frame" src="" scrolling="no" ...></iframe> This is the same output that I see if I use a browsers "View page source" tool.

Building blank XML tags with Nokogiri?

邮差的信 提交于 2019-12-04 08:25:32
I'm trying to build up an XML document using Nokogiri. Everything is pretty standard so far; most of my code just looks something like: builder = Nokogiri::XML::Builder.new do |xml| ... xml.Tag1(object.attribute_1) xml.Tag2(object.attribute_2) xml.Tag3(object.attribute_3) xml.Tag4(nil) end builder.to_xml However, that results in a tag like <Tag4/> instead of <Tag4></Tag4> , which is what my end user has specified that the output needs to be. How do I tell Nokogiri to put full tags around a nil value? SaveOptions ::NO_EMPTY_TAGS will get you what you want. require 'nokogiri' builder = Nokogiri: