nokogiri

Adding namespace using Nokogiri's XML Builder

青春壹個敷衍的年華 提交于 2019-12-18 03:01:29
问题 I have been wrecking my head for a few hours but I can't seem to determine how to add XMLNS namespace whilst using the Nokogiri XML Builder class to construct a XML structure. For instance, consider the XML sample below: I can create everything between the GetQuote tags but creating the "p:ACMRequest" remains a mystery. I came across this reference, https://gist.github.com/428455/7a15f84cc08c05b73fcec2af49947d458ae3b96a, that still doesn't make sense to me. Even referring to the XML

Getting viewable text words via Nokogiri

妖精的绣舞 提交于 2019-12-18 02:54:25
问题 I'd like to open a web page with Nokogiri and extract all the words that a user sees when they visit the page in a browser and analyze the word frequency. What is the easiest way of getting all readable words out of an html document with nokogiri? The ideal code snippet would take a html page (as a file, say) and give an array of individual words that come from all types of elements that are readable. (No need to worry about javascript or css hiding elements and thus hiding words; just all

Error installing nokogiri 1.6.0 on mac (libxml2)

南笙酒味 提交于 2019-12-17 23:27:35
问题 UPDATE: Fixed I found the answer in another thread. The workaround I used is to tell Nokogiri to use the system libraries instead: NOKOGIRI_USE_SYSTEM_LIBRARIES=1 bundle install ==== Trying to install nokogiri 1.6.0 on a mac. With previous versions, I had no problems. But 1.6.0 refuses to install. This is the error: Building native extensions. This could take a while... ERROR: Error installing nokogiri: ERROR: Failed to build gem native extension. /Users/josenriq/.rvm/rubies/ruby-1.9.3-head

Why doesn't Nokogiri xpath like xmlns declarations

ε祈祈猫儿з 提交于 2019-12-17 23:24:27
问题 I'm using Nokogiri::XML to parse responses from Amazon SimpleDB. The response is something like: <SelectResponse xmlns="http://sdb.amazonaws.com/doc/2007-11-07/"> <SelectResult> <Item> <Attribute><Name>Foo</Name><Value>42</Value></Attribute> <Attribute><Name>Bar</Name><Value>XYZ</Value></Attribute> </Item> </SelectResult> </SelectResponse> If I just hand the response straight over to Nokogiri, all XPath queries (e.g. doc/"//Item/Attribute[Name='Foo']/Value" ) return an empty array. But if I

Strip style attributes with nokogiri

被刻印的时光 ゝ 提交于 2019-12-17 22:40:47
问题 I'm scrapling an html page with nokogiri and i want to strip out all style attributes. How can I achieve this? (i'm not using rails so i can't use it's sanitize method and i don't want to use sanitize gem 'cause i want to blacklist remove not whitelist) html = open(url) doc = Nokogiri::HTML(html.read) doc.css('.post').each do |post| puts post.to_s end => <p><span style="font-size: x-large">bla bla <a href="http://torrentfreak.com/netflix-is-killing-bittorrent-in-the-us-110427/">statistica</a>

How can I create a nokogiri case insensitive Xpath selector?

不想你离开。 提交于 2019-12-17 18:35:33
问题 I'm using nokogiri to select the 'keywords' attribute like this: puts page.parser.xpath("//meta[@name='keywords']").to_html One of the pages I'm working with has the keywords label with a capital "K" which has motivated me to make the query case insensitive. <meta name="keywords"> AND <meta name="Keywords"> So, my question is: What is the best way to make a nokogiri selection case insensitive? EDIT Tomalak's suggestion below works great for this specific problem. I'd like to also use this

Print an XML document without the XML header line at the top

百般思念 提交于 2019-12-17 16:46:06
问题 I am just trying to find out how to to a to_xml with a Nokogiri::XML::Document or a Nokogiri::XML::DocumentFragment . Alternatively, I would like to use xPath on a Nokogiri::XML::DocumentFragment . I was unable to ascertain how to do that, however I am successfully parsing a Nokogiri::XML::Document . I am later including a parsed and modified DocumentFragment into another piece of XML, but I'm really getting bitten on what I thought would be some really simple things. Like trying to do a to

WARNING: Nokogiri was built against LibXML version 2.7.3, but has dynamically loaded 2.7.8

别等时光非礼了梦想. 提交于 2019-12-17 15:55:17
问题 After making a fresh install of Mac OS X 10.8 Mountain Lion, and after installing Ruby 1.9.3 and Ruby on Rails 3.2.6, I started the Rails console and I got this warning message: WARNING: Nokogiri was built against LibXML version 2.7.3, but has dynamically loaded 2.7.8 How can I fix it? 回答1: I reinstalled Ruby, that fixed it. Was able to use the built-in libraries. 回答2: I have found some fixes for Lion, but none for Mountain Lion yet. Nonetheless I have tried this and it works: gem uninstall

Nokogiri: How to select nodes by matching text?

流过昼夜 提交于 2019-12-17 10:19:36
问题 If I have a bunch of elements like: <p>A paragraph <ul><li>Item 1</li><li>Apple</li><li>Orange</li></ul></p> Is there a built in nokogiri method that would get me all, for example, p elements that contain the text "Apple"? (the example element above would match, for instance). 回答1: Nokogiri can do this (now) using jQuery extensions to CSS: require 'nokogiri' html = ' <html> <body> <p>foo</p> <p>bar</p> </body> </html> ' doc = Nokogiri::HTML(html) doc.at('p:contains("bar")').text.strip => "bar

How do I parse an HTML table with Nokogiri?

点点圈 提交于 2019-12-17 08:21:34
问题 I installed Ruby and Mechanize. It seems to me that it is posible in Nokogiri to do what I want to do but I do not know how to do it. What about this table ? It is just part of the HTML of a vBulletin forum site. I tried to keep the HTML structure but delete some text and tag attributes. I want to get some details per thread like: Title, Author, Date, Time, Replies, and Views. Please note that there are few tables in the HTML document? I am after one particular table with its tbody , <tbody