nokogiri | 易学教程

Adding namespace using Nokogiri's XML Builder

阅读更多关于 Adding namespace using Nokogiri's XML Builder

问题 I have been wrecking my head for a few hours but I can't seem to determine how to add XMLNS namespace whilst using the Nokogiri XML Builder class to construct a XML structure. For instance, consider the XML sample below: I can create everything between the GetQuote tags but creating the "p:ACMRequest" remains a mystery. I came across this reference, https://gist.github.com/428455/7a15f84cc08c05b73fcec2af49947d458ae3b96a, that still doesn't make sense to me. Even referring to the XML

Getting viewable text words via Nokogiri

阅读更多关于 Getting viewable text words via Nokogiri

问题 I'd like to open a web page with Nokogiri and extract all the words that a user sees when they visit the page in a browser and analyze the word frequency. What is the easiest way of getting all readable words out of an html document with nokogiri? The ideal code snippet would take a html page (as a file, say) and give an array of individual words that come from all types of elements that are readable. (No need to worry about javascript or css hiding elements and thus hiding words; just all

Error installing nokogiri 1.6.0 on mac (libxml2)

阅读更多关于 Error installing nokogiri 1.6.0 on mac (libxml2)

问题 UPDATE: Fixed I found the answer in another thread. The workaround I used is to tell Nokogiri to use the system libraries instead: NOKOGIRI_USE_SYSTEM_LIBRARIES=1 bundle install ==== Trying to install nokogiri 1.6.0 on a mac. With previous versions, I had no problems. But 1.6.0 refuses to install. This is the error: Building native extensions. This could take a while... ERROR: Error installing nokogiri: ERROR: Failed to build gem native extension. /Users/josenriq/.rvm/rubies/ruby-1.9.3-head

Why doesn't Nokogiri xpath like xmlns declarations

阅读更多关于 Why doesn't Nokogiri xpath like xmlns declarations

问题 I'm using Nokogiri::XML to parse responses from Amazon SimpleDB. The response is something like: <SelectResponse xmlns="http://sdb.amazonaws.com/doc/2007-11-07/"> <SelectResult> <Item> <Attribute><Name>Foo</Name><Value>42</Value></Attribute> <Attribute><Name>Bar</Name><Value>XYZ</Value></Attribute> </Item> </SelectResult> </SelectResponse> If I just hand the response straight over to Nokogiri, all XPath queries (e.g. doc/"//Item/Attribute[Name='Foo']/Value" ) return an empty array. But if I

Strip style attributes with nokogiri

阅读更多关于 Strip style attributes with nokogiri

问题 I'm scrapling an html page with nokogiri and i want to strip out all style attributes. How can I achieve this? (i'm not using rails so i can't use it's sanitize method and i don't want to use sanitize gem 'cause i want to blacklist remove not whitelist) html = open(url) doc = Nokogiri::HTML(html.read) doc.css('.post').each do |post| puts post.to_s end => bla bla <a href="http://torrentfreak.com/netflix-is-killing-bittorrent-in-the-us-110427/">statistica</a>

How can I create a nokogiri case insensitive Xpath selector?

阅读更多关于 How can I create a nokogiri case insensitive Xpath selector?

问题 I'm using nokogiri to select the 'keywords' attribute like this: puts page.parser.xpath("//meta[@name='keywords']").to_html One of the pages I'm working with has the keywords label with a capital "K" which has motivated me to make the query case insensitive. <meta name="keywords"> AND <meta name="Keywords"> So, my question is: What is the best way to make a nokogiri selection case insensitive? EDIT Tomalak's suggestion below works great for this specific problem. I'd like to also use this

Print an XML document without the XML header line at the top

阅读更多关于 Print an XML document without the XML header line at the top

问题 I am just trying to find out how to to a to_xml with a Nokogiri::XML::Document or a Nokogiri::XML::DocumentFragment . Alternatively, I would like to use xPath on a Nokogiri::XML::DocumentFragment . I was unable to ascertain how to do that, however I am successfully parsing a Nokogiri::XML::Document . I am later including a parsed and modified DocumentFragment into another piece of XML, but I'm really getting bitten on what I thought would be some really simple things. Like trying to do a to

WARNING: Nokogiri was built against LibXML version 2.7.3, but has dynamically loaded 2.7.8

阅读更多关于 WARNING: Nokogiri was built against LibXML version 2.7.3, but has dynamically loaded 2.7.8

问题 After making a fresh install of Mac OS X 10.8 Mountain Lion, and after installing Ruby 1.9.3 and Ruby on Rails 3.2.6, I started the Rails console and I got this warning message: WARNING: Nokogiri was built against LibXML version 2.7.3, but has dynamically loaded 2.7.8 How can I fix it? 回答1: I reinstalled Ruby, that fixed it. Was able to use the built-in libraries. 回答2: I have found some fixes for Lion, but none for Mountain Lion yet. Nonetheless I have tried this and it works: gem uninstall

Nokogiri: How to select nodes by matching text?

阅读更多关于 Nokogiri: How to select nodes by matching text?

问题 If I have a bunch of elements like: A paragraph <ul><li>Item 1</li><li>Apple</li><li>Orange</li></ul> Is there a built in nokogiri method that would get me all, for example, p elements that contain the text "Apple"? (the example element above would match, for instance). 回答1: Nokogiri can do this (now) using jQuery extensions to CSS: require 'nokogiri' html = ' <html> <body> foo bar </body> </html> ' doc = Nokogiri::HTML(html) doc.at('p:contains("bar")').text.strip => "bar

How do I parse an HTML table with Nokogiri?

阅读更多关于 How do I parse an HTML table with Nokogiri?

问题 I installed Ruby and Mechanize. It seems to me that it is posible in Nokogiri to do what I want to do but I do not know how to do it. What about this table ? It is just part of the HTML of a vBulletin forum site. I tried to keep the HTML structure but delete some text and tag attributes. I want to get some details per thread like: Title, Author, Date, Time, Replies, and Views. Please note that there are few tables in the HTML document? I am after one particular table with its tbody , <tbody