nokogiri

How can I get the absolute URL when extracting links using Nokogiri?

余生颓废 提交于 2019-11-26 18:02:32
问题 I'm using Nokogiri to extract links from a page but I would like to get the absolute path even though the one on the page is a relative one. How can I accomplish this? 回答1: Nokogiri is unrelated, other than the fact that it gives you the link anchor to begin with. Use Ruby's URI library to manage paths: absolute_uri = URI.join( page_url, href ).to_s Seen in action: require 'uri' # The URL of the page with the links page_url = 'http://foo.com/zee/zaw/zoom.html' # A variety of links to test.

Nokogiri/Xpath namespace query

非 Y 不嫁゛ 提交于 2019-11-26 17:40:23
I'm trying to pull out the dc:title element using an xpath. I can pull out the metadata using the following code. doc = <<END <?xml version="1.0" encoding="UTF-8"?> <package xmlns="http://www.idpf.org/2007/opf" version="2.0"> <metadata xmlns:dc="URI"> <dc:title>title text</dc:title> </metadata> </package> END doc = Nokogiri::XML(doc) # Awesome this works! puts '//xmlns:metadata' puts doc.xpath('//xmlns:metadata') # => <metadata xmlns:dc="URI"><dc:title>title text</dc:title></metadata> As you can see the above appears to work correctly. However I don't seem to be able to get the title

nokogiri will not install - ERROR: Failed to build gem native extension [duplicate]

我是研究僧i 提交于 2019-11-26 16:54:07
问题 This question already has answers here : `require': no such file to load — mkmf (LoadError) (9 answers) Closed 6 years ago . On a ubuntu 12.04 I get the below. sudo apt-get install libxml2 libxml2-dev libxslt libxslt-dev sudo gem install nokogiri Building native extensions. This could take a while... ERROR: Error installing nokogiri: ERROR: Failed to build gem native extension. /usr/bin/ruby1.9.1 extconf.rb /usr/lib/ruby/1.9.1/rubygems/custom_require.rb:36:in `require': cannot load such file

Error to install Nokogiri on OSX 10.9 Maverick?

[亡魂溺海] 提交于 2019-11-26 15:39:33
I upgraded my OSX (Lion) to Mavericks and I can't install Nokogiri for my projects. I already install XCode 5.0.1, Command Line Tools (using xcode-select --install ), and already installed libxml2 from Homebrew and I am still having problems. The error is: Gem::Installer::ExtensionBuildError: ERROR: Failed to build gem native extension. /Users/ericcamalionte/.rvm/rubies/ruby-1.9.2-p320/bin/ruby extconf.rb checking for libxml/parser.h... *** extconf.rb failed *** Could not create Makefile due to some reason, probably lack of necessary libraries and/or headers. Check the mkmf.log file for more

XPath axis, get all following nodes until

泄露秘密 提交于 2019-11-26 13:59:23
问题 I have the following example of HTML: <!-- lots of html --> <h2>Foo bar</h2> <p>lorem</p> <p>ipsum</p> <p>etc</p> <h2>Bar baz</h2> <p>dum dum dum</p> <p>poopfiddles</p> <!-- lots more html ... --> I'm looking to extract all paragraphs following the 'Foo bar' header, until I reach the 'Bar baz' header (the text for the 'Bar baz' header is unknown, so unfortunately I can't use the answer provided by bougyman). Now I can of course using something like //h2[text()='Foo bar']/following::p but that

How do I pretty-print HTML with Nokogiri?

懵懂的女人 提交于 2019-11-26 13:58:14
问题 I wrote a web crawler in Ruby and I'm using Nokogiri::HTML to parse the page. I need to print the page out and while messing around in IRB I noticed a pretty_print method. However it takes a parameter and I can't figure out what it wants. My crawler is caching the HTML of the webpages and writing it to files on my local machine. I would like to "pretty print" the HTML so that it looks nice and properly formatted when I do so. 回答1: By "pretty printing" of HTML page I presume you meant that you

HTML-parser on Node.js [closed]

守給你的承諾、 提交于 2019-11-26 13:53:17
Is there something like Ruby's nokogiri on nodejs? I mean a user-friendly HTML-parser. I'd seen on Node.js modules page some parsers, but I can't find something pretty and fresh. Farid Nouri Neshat If you want to build DOM you can use jsdom . There's also cheerio , it has the jQuery interface and it's a lot faster than older versions of jsdom, although these days they are similar in performance. You might wanna have a look at htmlparser2 , which is a streaming parser, and according to its benchmark, it seems to be faster than others, and no DOM by default. It can also produce a DOM, as it is

Nokogiri, open-uri, and Unicode Characters

不打扰是莪最后的温柔 提交于 2019-11-26 12:40:40
问题 I\'m using Nokogiri and open-uri to grab the contents of the title tag on a webpage, but am having trouble with accented characters. What\'s the best way to deal with these? Here\'s what I\'m doing: require \'open-uri\' require \'nokogiri\' doc = Nokogiri::HTML(open(link)) title = doc.at_css(\"title\") At this point, the title looks like this: Rag\\303\\271 Instead of: Ragù How can I have nokogiri return the proper character (e.g. ù in this case)? Here\'s an example URL: http://www.epicurious

nokogiri gem installation error

倖福魔咒の 提交于 2019-11-26 12:00:22
问题 I know there are a lot of questions about this gem but no answer has worked for me. When I run in SSH gem install nokogiri I get this error: Extracting libxml2-2.8.0.tar.gz into tmp/x86_64-unknown-linux-gnu/ports/libxml2/2.8.0... OK Running patch with /home/user58952277/.gem/ruby/1.9.3/gems/nokogiri-1.6.2.1/ports/patches/libxml2/0001-Fix-parser-local-buffers-size-problems.patch... Running \'patch\' for libxml2 2.8.0... ERROR, review \'tmp/x86_64-unknown-linux-gnu/ports/libxml2/2.8.0/patch.log

Installing Nokogiri on OSX 10.10 Yosemite

拟墨画扇 提交于 2019-11-26 10:10:20
问题 I recently upgrade to the 10.10 Yosemite beta, but I\'m having trouble getting Nokogiri installed. I\'m using RVM and Ruby 1.9.3. I\'ve also followed the steps here and tried following the instructions on Nokogiri\'s homepage. I\'ve installed libxml2 (2.9.1) and libxslt (1.1.28) via homebrew, and have tried using the command line tools from both my Xcode 5 install and Xcode 6 beta. gem install nokogiri -v \'1.5.5\' Building native extensions. This could take a while... ERROR: Error installing