nokogiri | 易学教程

Nokogiri: Searching for <div> using XPath

阅读更多关于 Nokogiri: Searching for using XPath

问题 I use Nokogiri (Rubygem) css search to look for certain <div> inside my html. It looks like Nokogiri's css search doesn't like regex. I would like to switch to Nokogiri's xpath search as this seems to support regex in search strings. How do I implement the (pseudo) css search mentioned below in an xpath search? require 'rubygems' require 'nokogiri' value = Nokogiri::HTML.parse(<<-HTML_END) "<html> <body> <p id='para-1'>A</p> <p id='para-22'>B</p> <h1>Bla</h1> <p id='para-3'>C</p> <p id='para

Print an XML document without the XML header line at the top

阅读更多关于 Print an XML document without the XML header line at the top

I am just trying to find out how to to a to_xml with a Nokogiri::XML::Document or a Nokogiri::XML::DocumentFragment . Alternatively, I would like to use xPath on a Nokogiri::XML::DocumentFragment . I was unable to ascertain how to do that, however I am successfully parsing a Nokogiri::XML::Document . I am later including a parsed and modified DocumentFragment into another piece of XML, but I'm really getting bitten on what I thought would be some really simple things. Like trying to do a to_xml on a doc or docfrag, and NOT INCLUDING that xml line at the top. Why so hard? The simplest way to

How to prevent Nokogiri from adding <DOCTYPE> tags?

阅读更多关于 How to prevent Nokogiri from adding tags?

I noticed something strange using Nokogiri recently. All of the HTML I had been parsing had been given start and end <html> and <body> tags. <!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\" \"http://www.w3.org/TR/REC-html40/loose.dtd\">\n<html><body>\n How can I prevent Nokogiri from doing this? I.E., when I do: doc = Nokogiri::HTML("<div>some content</div>") doc.to_s or: doc.to_html I get the original: <html blah><body>div>some content</div></body></html> The problem occurs because you're using the wrong method in Nokogiri to parse your content. require 'nokogiri' doc =

Find and replace HTML tags

阅读更多关于 Find and replace HTML tags

I have the following HTML: <html> <body> <h1>Foo</h1> <p>The quick brown fox.</p> <h1>Bar</h1> <p>Jumps over the lazy dog.</p> </body> </html> I'd like to change it into the following HTML: <html> <body> <p class="title">Foo</p> <p>The quick brown fox.</p> <p class="title">Bar</p> <p>Jumps over the lazy dog.</p> </body> </html> How can I find and replace certain HTML tags? I can use the Nokogiri gem. Try this: require 'nokogiri' html_text = "<html><body><h1>Foo</h1><p>The quick brown fox.</p><h1>Bar</h1><p>Jumps over the lazy dog.</p></body></html>" frag = Nokogiri::HTML(html_text) frag.xpath(

WARNING: Nokogiri was built against LibXML version 2.7.3, but has dynamically loaded 2.7.8

阅读更多关于 WARNING: Nokogiri was built against LibXML version 2.7.3, but has dynamically loaded 2.7.8

After making a fresh install of Mac OS X 10.8 Mountain Lion, and after installing Ruby 1.9.3 and Ruby on Rails 3.2.6, I started the Rails console and I got this warning message: WARNING: Nokogiri was built against LibXML version 2.7.3, but has dynamically loaded 2.7.8 How can I fix it? I reinstalled Ruby, that fixed it. Was able to use the built-in libraries. I have found some fixes for Lion, but none for Mountain Lion yet. Nonetheless I have tried this and it works: gem uninstall nokogiri libxml-ruby brew update brew uninstall libxml2 brew install libxml2 --with-xml2-config brew link libxml2

How do I get Nokogiri to understand my namespaces?

阅读更多关于 How do I get Nokogiri to understand my namespaces?

问题 I have the following XML document: <samlp:LogoutRequest ID="123456789" Version="2.0" IssueInstant="200904051217"> <saml:NameID>@NOT_USED@</saml:NameID> <samlp:SessionIndex>abcdefg</samlp:SessionIndex> </samlp:LogoutRequest> I'd like to get the content of the SessionIndex (that is, 'abcdefg') out of it. I've tried this: XPATH_QUERY = "LogoutRequest[@ID][@Version='2.0'][IssueInstant]/SessionIndex" SAML_XMLNS = 'urn:oasis:names:tc:SAML:2.0:assertion' SAMLP_XMLNS = 'urn:oasis:names:tc:SAML:2.0

Save all image files from a website

阅读更多关于 Save all image files from a website

问题 I'm creating a small app for myself where I run a Ruby script and save all of the images off of my blog. I can't figure out how to save the image files after I've identified them. Any help would be much appreciated. require 'rubygems' require 'nokogiri' require 'open-uri' url = '[my blog url]' doc = Nokogiri::HTML(open(url)) doc.css("img").each do |item| #something end 回答1: URL = '[my blog url]' require 'nokogiri' # gem install nokogiri require 'open-uri' # already part of your ruby install

OS X Lion, Attempting Nokogiri install - libxml2 is missing

阅读更多关于 OS X Lion, Attempting Nokogiri install - libxml2 is missing

sudo gem install nokogiri Building native extensions. This could take a while... ERROR: Error installing nokogiri: ERROR: Failed to build gem native extension. /Users/sajeev86/.rvm/rubies/ruby-1.8.7-p352/bin/ruby extconf.rb checking for libxml/parser.h... no ----- libxml2 is missing. please visit http://nokogiri.org/tutorials/installing_nokogiri.html for help with installing dependencies. ----- *** extconf.rb failed *** Could not create Makefile due to some reason, probably lack of necessary libraries and/or headers. Check the mkmf.log file for more details. You may need configuration options.

extract single string from HTML using Ruby/Mechanize (and Nokogiri)

阅读更多关于 extract single string from HTML using Ruby/Mechanize (and Nokogiri)

I am extracting data from a forum. My script based on is working fine. Now I need to extract date and time (21 Dec 2009, 20:39) from single post. I cannot get it work. I used FireXPath to determine the xpath. Sample code: require 'rubygems' require 'mechanize' post_agent = WWW::Mechanize.new post_page = post_agent.get('http://www.vbulletin.org/forum/showthread.php?t=230708') puts post_page.parser.xpath('/html/body/div/div/div/div/div/table/tbody/tr/td/div[2]/text()').to_s.strip puts post_page.parser.at_xpath('/html/body/div/div/div/div/div/table/tbody/tr/td/div[2]/text()').to_s.strip puts post

nokogiri will not install - ERROR: Failed to build gem native extension [duplicate]

阅读更多关于 nokogiri will not install - ERROR: Failed to build gem native extension [duplicate]

This question already has an answer here: `require': no such file to load — mkmf (LoadError) 9 answers On a ubuntu 12.04 I get the below. sudo apt-get install libxml2 libxml2-dev libxslt libxslt-dev sudo gem install nokogiri Building native extensions. This could take a while... ERROR: Error installing nokogiri: ERROR: Failed to build gem native extension. /usr/bin/ruby1.9.1 extconf.rb /usr/lib/ruby/1.9.1/rubygems/custom_require.rb:36:in `require': cannot load such file -- mkmf (LoadError) from /usr/lib/ruby/1.9.1/rubygems/custom_require.rb:36:in `require' from extconf.rb:5:in `<main>' Gem