nokogiri | 易学教程

Strip text from HTML document using Ruby

阅读更多关于 Strip text from HTML document using Ruby

问题 There are lots of examples of how to strip HTML tags from a document using Ruby, Hpricot and Nokogiri have inner_text methods that remove all HTML for you easily and quickly. What I am trying to do is the opposite, remove all the text from an HTML document, leaving just the tags and their attributes. I considered looping through the document setting inner_html to nil but then really you'd have to do this in reverse as the first element (root) has an inner_html of the entire rest of the

Find and replace entire HTML nodes with Nokogiri

阅读更多关于 Find and replace entire HTML nodes with Nokogiri

问题 i have an HTML, that should be transformed, having some tags replaced with another tags. I don't know about these tags, because they will come from db. So, set_attribute or name methods of Nokogiri are not suitable for me. I need to do it, in a way, like in this pseudo-code: def preprocess_content doc = Nokogiri::HTML( self.content ) doc.css("div.to-replace").each do |div| # "get_html_text" will obtain HTML from db. It can be anything, even another tags, tag groups etc. div.replace self.get

trying to get content inside cdata tags in xml file using nokogiri

阅读更多关于 trying to get content inside cdata tags in xml file using nokogiri

I have seen several things on this, but nothing has seemed to work so far. I am parsing an xml via a url using nokogiri on rails 3 ruby 1.9.2. A snippet of the xml looks like this: <NewsLineText> <![CDATA[ Anna Kendrick is ''obsessed'' with 'Game of Thrones' and loves to cook, particularly creme brulee. ]]> </NewsLineText> I am trying to parse this out to get the text associated with the NewsLineText r = node.at_xpath('.//newslinetext') if node.at_xpath('.//newslinetext') s = node.at_xpath('.//newslinetext').text if node.at_xpath('.//newslinetext') t = node.at_xpath('.//newslinetext').content

Nokogiri and Xpath: find all text between two tags

阅读更多关于 Nokogiri and Xpath: find all text between two tags

I'm not sure if it's a matter of syntax or differences in versions but I can't seem to figure this out. I want to take data that is inside a (non-closing) td from the h2 tag to the h3 tag. Here is what the HTML would look like. <td valign="top" width="350"> <br><h2>NameIWant</h2><br> <br>Town<br> PhoneNumber<br> <a href="mailto:emailIwant@nowhere.com" class="links">emailIwant@nowhere.com</a> <br> <a href="http://websiteIwant.com" class="links">websiteIwant.com</a> <br><br> <br><img src="images/spacer.gif"/><br> <h3><b>I want to stop before this!</b></h3> Lorem Ipsum Yadda Yadda<br> <img src=

Using Nokogiri to Split Content on BR tags

阅读更多关于 Using Nokogiri to Split Content on BR tags

I have a snippet of code im trying to parse with nokogiri that looks like this: <td class="j"> <a title="title text1" href="http://link1.com">Link 1</a> (info1), Blah 1,<br> <a title="title text2" href="http://link2.com">Link 2</a> (info1), Blah 1,<br> <a title="title text2" href="http://link3.com">Link 3</a> (info2), Blah 1 Foo 2,<br> </td> I have access to the source of the td.j using something like this: data_items = doc.css("td.j") My goal is to split each of those lines up into an array of hashes. The only logical splitting point i can see is to split on the BRs and then use some regex on

Install Nokogiri 1.6.1 under Ruby 2.0.0p353 (rvm based installation) fails (OSX Mavericks)?

阅读更多关于 Install Nokogiri 1.6.1 under Ruby 2.0.0p353 (rvm based installation) fails (OSX Mavericks)?

I've tried to install Nokogiri 1.6.1 under Ruby and RVM but is failing with the following error: Gem::Installer::ExtensionBuildError: ERROR: Failed to build gem native extension. /Users/lmo0/.rvm/rubies/ruby-2.0.0-p353/bin/ruby extconf.rb Extracting libxml2-2.8.0.tar.gz into tmp/x86_64-apple-darwin13.0.0/ports/libxml2/2.8.0... OK Running 'configure' for libxml2 2.8.0... OK Running 'compile' for libxml2 2.8.0... OK Running 'install' for libxml2 2.8.0... OK Activating libxml2 2.8.0 (from /Users/lmo0/.rvm/gems/ruby-2.0.0-p353/gems/nokogiri-1.6.1/ports/x86_64-apple-darwin13.0.0/libxml2/2.8.0)...

Nokogiri: Select content between element A and B

阅读更多关于 Nokogiri: Select content between element A and B

问题 What's the smartest way to have Nokogiri select all content between the start and the stop element (including start-/stop-element)? Check example code below to understand what I'm looking for: require 'rubygems' require 'nokogiri' value = Nokogiri::HTML.parse(<<-HTML_END) "<html> <body> <p id='para-1'>A</p> <div class='block' id='X1'> <p class="this">Foo</p> <p id='para-2'>B</p> </div> <p id='para-3'>C</p> <p class="that">Bar</p> <p id='para-4'>D</p> <p id='para-5'>E</p> <div class='block' id

Nokogiri: Searching for <div> using XPath

阅读更多关于 Nokogiri: Searching for using XPath

I use Nokogiri (Rubygem) css search to look for certain <div> inside my html. It looks like Nokogiri's css search doesn't like regex. I would like to switch to Nokogiri's xpath search as this seems to support regex in search strings. How do I implement the (pseudo) css search mentioned below in an xpath search? require 'rubygems' require 'nokogiri' value = Nokogiri::HTML.parse(<<-HTML_END) "<html> <body> <p id='para-1'>A</p> <p id='para-22'>B</p> <h1>Bla</h1> <p id='para-3'>C</p> <p id='para-4'>D</p> <div class="foo" id="eq-1_bl-1"> <p id='para-5'>3</p> </div> </body> </html>" HTML_END # my

Creating an XML document with a namespaced root element with Nokogiri builder

阅读更多关于 Creating an XML document with a namespaced root element with Nokogiri builder

问题 I'm implementing an exporter for an XML data format that requires namespaces. I'm using the Nokogiri XML Builder (version 1.4.0) to do this. However, I can't get Nokogiri to create a root node with a namespace. This works: Nokogiri::XML::Builder.new { |xml| xml.root('xmlns:foobar' => 'my-ns-url') }.to_xml <?xml version="1.0"?> <root xmlns:foobar="my-ns-url"/> As does this: Nokogiri::XML::Builder.new do |xml| xml.root('xmlns:foobar' => 'my-ns-url') { xml['foobar'].child } end.to_xml <?xml

Nokogiri to_xml without carriage returns

阅读更多关于 Nokogiri to_xml without carriage returns

问题 I'm currently using the Nokogiri::XML::Builder class to construct an XML document, then calling .to_xml on it. The resulting string always contains a bunch of spaces, linefeeds and carriage returns in between the nodes, and I can't for the life of me figure out how to get rid of them. Here's an example: b = Nokogiri::XML::Builder.new do |xml| xml.root do xml.text("Value") end end b.to_xml This results in the following: <?xml version="1.0"?> <root>Value</root> What I want is this (notice the