nokogiri

Nokogiri builder performance on huge XML?

放肆的年华 提交于 2019-12-11 01:09:52
问题 I need to build a huge XML file, about 1-50 MB. I thought that using builder would be effective enough and, well it is, somewhat. The problem is, after the program reaches its last line it doesn't end immediately, but Ruby is still doing something for several seconds, maybe garbage collection? After that the program finally ends. To give a real example, I am measured the time of building an XML file. It outputs 55 seconds (there is a database behind so it takes long) when the XML was built,

Parsing large HTML files with Nokogiri

时光毁灭记忆、已成空白 提交于 2019-12-11 00:42:44
问题 I'm trying to parse http://www.pro-medic.ru/index.php?ht=246&perpage=all with Nokogiri, but unfortunately I can't get all items from the page. My simple test code is: require 'open-uri' require 'nokogiri' html = Nokogiri::HTML open('http://www.pro-medic.ru/index.php?ht=246&perpage=all') p html.css('ul.products-grid-compact li .goods_container').count It returns only 83 items but the real count is about 186. I thought that the problem could be in open , but it seems that function reads the

Merge two XML files in Nokogiri

為{幸葍}努か 提交于 2019-12-10 22:07:12
问题 There are some posts about this topic, but I wasn't able to figure out how to solve my problem. I have two XML files: <Products> <Product> <some> <value></value> </some> </Product> <Product> <more> <data></data> </more> </Product> </Products> And: <Products> <Product> <some_other> <value></value> </some_other> </Product> </Products> I want to generate an XML document that looks like this: <Products> <Product> <some> <value></value> </some> </Product> <Product> <more> <data></data> </more> <

Error when installing nokogiri on mac with“sudo gem install nokogiri”

|▌冷眼眸甩不掉的悲伤 提交于 2019-12-10 21:35:31
问题 I was trying to install nokogiri because it is required for rails to be started $ rails s /usr/local/rvm/gems/ruby-1.9.3-p194@global/gems/bundler-1.1.5/lib/bundler/spec_set.rb:90:in `block in materialize': Could not find nokogiri-1.5.5 in any of the sources (Bundler::GemNotFound) from /usr/local/rvm/gems/ruby-1.9.3-p194@global/gems/bundler-1.1.5/lib/bundler/spec_set.rb:83:in `map!' from /usr/local/rvm/gems/ruby-1.9.3-p194@global/gems/bundler-1.1.5/lib/bundler/spec_set.rb:83:in `materialize'

HTML Parser into DOM in Ruby

感情迁移 提交于 2019-12-10 18:55:58
问题 Is there any HTML parser in Ruby that reads HTML document into a DOM Tree and represents HTML tags as DOM elements? I know Nokogiri but it doesn't parse HTML into DOM tree. 回答1: Despite your remark, Nokogiri is the way to go: doc = Nokogiri::HTML('<body><p>Hello, worlds!</body>') It parses even invalid HTML and returns a DOM tree: >> doc.class => Nokogiri::HTML::Document >> doc.root.class => Nokogiri::XML::Element >> doc.root.children.class => Nokogiri::XML::NodeSet >> doc.root.children.first

Nokogiri native extension failing to build (Not libxml2 or libxslt missing issue)

烈酒焚心 提交于 2019-12-10 18:51:22
问题 As the title says, it doesn't seem to be failing because libxml2 or libxslt is missing. I'm not really sure what to make of the error. (Get it? Because the issue is during make? hehe...) Anywho, here's the output I'm getting. Any ideas would be appreciated: Building native extensions. This could take a while... ERROR: Error installing nokogiri: ERROR: Failed to build gem native extension. /usr/bin/ruby1.9.1 extconf.rb extconf.rb:10: Use RbConfig instead of obsolete and deprecated Config.

XPath to select preceding element with optional intervening whitespace-only text node

|▌冷眼眸甩不掉的悲伤 提交于 2019-12-10 18:32:15
问题 Given an element as context I want to select the preceding sibling element and check to see if it has a particular name. The caveat is that I do not want to select it if there is an intervening text node that has non-whitespace content. For example, given this XML document… <r> <a>a1</a><a>a2</a> b <a>a3</a> <a>a4</a> <b/> <a>a5</a> </r> …then: For "a1" there should be no match (there is no <a> sibling element immediately preceding it) For "a2" then "a1" should be matched (there is no

When using Nokogiri, how do you suppress the insertion of self-closing tags?

穿精又带淫゛_ 提交于 2019-12-10 18:15:58
问题 My XML doc, which is the config file for a Jenkins job, has a lot of empty tags like: <string></string> which Nokogiri replaces with: <string/> While this is the "recommended" way to write XML, it ends up generating unnecessary changes to the XML that make it difficult to read the meaningful content changes. Is there a way to suppress this behavior? 回答1: You can use the NO_EMPTY_TAGS option: doc.to_xml(:save_with => Nokogiri::XML::Node::SaveOptions::NO_EMPTY_TAGS) or the rather more concise:

libxml2 missing when installing nokogiri's gem devkit (windows)

ⅰ亾dé卋堺 提交于 2019-12-10 17:35:07
问题 I've been experiencing a lot of problems with this gem, so I read that I had to use de 1.5.0-beta. Since I'm using windows, I downloaded DevKit. But when I ran: gem install nokogiri --pre -- --with-xml2-lib --with-xslt-lib I get: Temporarily enhancing PATH to include DevKit... Building native extensions. This could take a while... ERROR: Error installing nokogiri: ERROR: Failed to build gem native extension. C:/Ruby187/bin/ruby.exe extconf.rb --with-xml2-lib --with-xslt-lib --pla taform=ruby

Use Nokogiri to get all nodes in an element that contain a specific attribute name

早过忘川 提交于 2019-12-10 17:27:10
问题 I'd like to use Nokogiri to extract all nodes in an element that contain a specific attribute name. e.g., I'd like to find the 2 nodes that contain the attribute "blah" in the document below. @doc = Nokogiri::HTML::DocumentFragment.parse <<-EOHTML <body> <h1 blah="afadf">Three's Company</h1> <div>A love triangle.</div> <b blah="adfadf">test test test</b> </body> EOHTML I found this suggestion (below) at this website: http://snippets.dzone.com/posts/show/7994, but it doesn't return the 2 nodes