nokogiri

Nokogiri: Searching for <div> using XPath

风流意气都作罢 提交于 2019-11-28 01:06:20
问题 I use Nokogiri (Rubygem) css search to look for certain <div> inside my html. It looks like Nokogiri's css search doesn't like regex. I would like to switch to Nokogiri's xpath search as this seems to support regex in search strings. How do I implement the (pseudo) css search mentioned below in an xpath search? require 'rubygems' require 'nokogiri' value = Nokogiri::HTML.parse(<<-HTML_END) "<html> <body> <p id='para-1'>A</p> <p id='para-22'>B</p> <h1>Bla</h1> <p id='para-3'>C</p> <p id='para

Print an XML document without the XML header line at the top

Deadly 提交于 2019-11-28 00:58:45
I am just trying to find out how to to a to_xml with a Nokogiri::XML::Document or a Nokogiri::XML::DocumentFragment . Alternatively, I would like to use xPath on a Nokogiri::XML::DocumentFragment . I was unable to ascertain how to do that, however I am successfully parsing a Nokogiri::XML::Document . I am later including a parsed and modified DocumentFragment into another piece of XML, but I'm really getting bitten on what I thought would be some really simple things. Like trying to do a to_xml on a doc or docfrag, and NOT INCLUDING that xml line at the top. Why so hard? The simplest way to

How to prevent Nokogiri from adding <DOCTYPE> tags?

爱⌒轻易说出口 提交于 2019-11-27 21:52:55
I noticed something strange using Nokogiri recently. All of the HTML I had been parsing had been given start and end <html> and <body> tags. <!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\" \"http://www.w3.org/TR/REC-html40/loose.dtd\">\n<html><body>\n How can I prevent Nokogiri from doing this? I.E., when I do: doc = Nokogiri::HTML("<div>some content</div>") doc.to_s or: doc.to_html I get the original: <html blah><body>div>some content</div></body></html> The problem occurs because you're using the wrong method in Nokogiri to parse your content. require 'nokogiri' doc =

Find and replace HTML tags

淺唱寂寞╮ 提交于 2019-11-27 21:25:11
I have the following HTML: <html> <body> <h1>Foo</h1> <p>The quick brown fox.</p> <h1>Bar</h1> <p>Jumps over the lazy dog.</p> </body> </html> I'd like to change it into the following HTML: <html> <body> <p class="title">Foo</p> <p>The quick brown fox.</p> <p class="title">Bar</p> <p>Jumps over the lazy dog.</p> </body> </html> How can I find and replace certain HTML tags? I can use the Nokogiri gem. Try this: require 'nokogiri' html_text = "<html><body><h1>Foo</h1><p>The quick brown fox.</p><h1>Bar</h1><p>Jumps over the lazy dog.</p></body></html>" frag = Nokogiri::HTML(html_text) frag.xpath(

WARNING: Nokogiri was built against LibXML version 2.7.3, but has dynamically loaded 2.7.8

时光怂恿深爱的人放手 提交于 2019-11-27 20:37:06
After making a fresh install of Mac OS X 10.8 Mountain Lion, and after installing Ruby 1.9.3 and Ruby on Rails 3.2.6, I started the Rails console and I got this warning message: WARNING: Nokogiri was built against LibXML version 2.7.3, but has dynamically loaded 2.7.8 How can I fix it? I reinstalled Ruby, that fixed it. Was able to use the built-in libraries. I have found some fixes for Lion, but none for Mountain Lion yet. Nonetheless I have tried this and it works: gem uninstall nokogiri libxml-ruby brew update brew uninstall libxml2 brew install libxml2 --with-xml2-config brew link libxml2

How do I get Nokogiri to understand my namespaces?

天涯浪子 提交于 2019-11-27 18:44:24
问题 I have the following XML document: <samlp:LogoutRequest ID="123456789" Version="2.0" IssueInstant="200904051217"> <saml:NameID>@NOT_USED@</saml:NameID> <samlp:SessionIndex>abcdefg</samlp:SessionIndex> </samlp:LogoutRequest> I'd like to get the content of the SessionIndex (that is, 'abcdefg') out of it. I've tried this: XPATH_QUERY = "LogoutRequest[@ID][@Version='2.0'][IssueInstant]/SessionIndex" SAML_XMLNS = 'urn:oasis:names:tc:SAML:2.0:assertion' SAMLP_XMLNS = 'urn:oasis:names:tc:SAML:2.0

Save all image files from a website

半世苍凉 提交于 2019-11-27 18:36:57
问题 I'm creating a small app for myself where I run a Ruby script and save all of the images off of my blog. I can't figure out how to save the image files after I've identified them. Any help would be much appreciated. require 'rubygems' require 'nokogiri' require 'open-uri' url = '[my blog url]' doc = Nokogiri::HTML(open(url)) doc.css("img").each do |item| #something end 回答1: URL = '[my blog url]' require 'nokogiri' # gem install nokogiri require 'open-uri' # already part of your ruby install

OS X Lion, Attempting Nokogiri install - libxml2 is missing

霸气de小男生 提交于 2019-11-27 18:08:57
sudo gem install nokogiri Building native extensions. This could take a while... ERROR: Error installing nokogiri: ERROR: Failed to build gem native extension. /Users/sajeev86/.rvm/rubies/ruby-1.8.7-p352/bin/ruby extconf.rb checking for libxml/parser.h... no ----- libxml2 is missing. please visit http://nokogiri.org/tutorials/installing_nokogiri.html for help with installing dependencies. ----- *** extconf.rb failed *** Could not create Makefile due to some reason, probably lack of necessary libraries and/or headers. Check the mkmf.log file for more details. You may need configuration options.

extract single string from HTML using Ruby/Mechanize (and Nokogiri)

ぃ、小莉子 提交于 2019-11-27 15:23:06
I am extracting data from a forum. My script based on is working fine. Now I need to extract date and time (21 Dec 2009, 20:39) from single post. I cannot get it work. I used FireXPath to determine the xpath. Sample code: require 'rubygems' require 'mechanize' post_agent = WWW::Mechanize.new post_page = post_agent.get('http://www.vbulletin.org/forum/showthread.php?t=230708') puts post_page.parser.xpath('/html/body/div/div/div/div/div/table/tbody/tr/td/div[2]/text()').to_s.strip puts post_page.parser.at_xpath('/html/body/div/div/div/div/div/table/tbody/tr/td/div[2]/text()').to_s.strip puts post

nokogiri will not install - ERROR: Failed to build gem native extension [duplicate]

不打扰是莪最后的温柔 提交于 2019-11-27 14:44:55
This question already has an answer here: `require': no such file to load — mkmf (LoadError) 9 answers On a ubuntu 12.04 I get the below. sudo apt-get install libxml2 libxml2-dev libxslt libxslt-dev sudo gem install nokogiri Building native extensions. This could take a while... ERROR: Error installing nokogiri: ERROR: Failed to build gem native extension. /usr/bin/ruby1.9.1 extconf.rb /usr/lib/ruby/1.9.1/rubygems/custom_require.rb:36:in `require': cannot load such file -- mkmf (LoadError) from /usr/lib/ruby/1.9.1/rubygems/custom_require.rb:36:in `require' from extconf.rb:5:in `<main>' Gem