nokogiri | 易学教程

How can I make empty tags self-closing with Nokogiri?

阅读更多关于 How can I make empty tags self-closing with Nokogiri?

I've created an XML template in ERB. I fill it in with data from a database during an export process. In some cases, there is a null value, in which case an element may be empty, like this: <someitem> </someitem> In that case, the client receiving the export wants it to be converted into a self-closing tag: <someitem/> I'm trying to see how to get Nokogiri to do this, but I don't see it yet. Does anybody know how to make empty XML tags self-closing with Nokogiri? Update A regex was sufficient to do what I specified above, but the client now also wants tags whose children are all empty to be

How do I get innerHtml using Nokogiri gem

阅读更多关于 How do I get innerHtml using Nokogiri gem

问题 e.g. i have html: <div class="item"> <p> bla bla<br/> bla bla </p> </div> i need to get inner html of div.item: <p> bla bla<br/> bla bla </p> i know that i can use: doc.css("div.item:first").text text method return clean text without any html tags but what should i do for getting inner html of div.item? tried: doc.css("div.item:first").html but doesn't work, documentation did not help as well any ideas? 回答1: If you just need the string: doc.css("div.item:first").inner_html 来源： https:/

Nokogiri fails to install on OS X

阅读更多关于 Nokogiri fails to install on OS X

There are many posts on this issue, however, there can be a couple of reasons why Nokogiri (version 1.6.x, 1.7.x or 1.8.x) fails to install on OS X. Related articles: OS X 10.6 Installing Nokogiri cann't install nokogiri 1.6.1 on Mac OS X 10.9 Maveriks Command Line not installed: This is the easiest one to check and fix: run xcode-select --install in a terminal window to install it, then try installing Nokogiri one more time by running gem install nokogiri . Some are reporting that they used gem install nokogiri -- --use-system-libraries Libxml, libxlt are too recent: This situation is due to

Nokogiri fails to install on OS X

阅读更多关于 Nokogiri fails to install on OS X

问题 There are many posts on this issue, however, there can be a couple of reasons why Nokogiri (version 1.6.x, 1.7.x or 1.8.x) fails to install on OS X. Related articles: OS X 10.6 Installing Nokogiri cann't install nokogiri 1.6.1 on Mac OS X 10.9 Maveriks 回答1: Command Line not installed: This is the easiest one to check and fix: run xcode-select --install in a terminal window to install it, then try installing Nokogiri one more time by running gem install nokogiri . Some are reporting that they

Remove a tag but keep the text

阅读更多关于 Remove a tag but keep the text

So I have this <a> tag in a xml file <a href="/www.somethinggggg.com">Something 123</a> My desired result is to use Nokogiri and completely remove its tag so it is no longer a clickable link e.g Something 123 My attempt: content = Nokogiri::XML.fragment(page_content) content.search('.//a').remove But this removes the text too. Any suggestions on how to achieve my desired result using Nokogiri? Here is what I would do : require 'nokogiri' doc = Nokogiri::HTML.parse <<-eot <a href="/www.somethinggggg.com">Something 123</a> eot node = doc.at("a") node.replace(node.text) puts doc.to_html output <

How do I access HTML elements that are rendered in JavaScript using XPath?

阅读更多关于 How do I access HTML elements that are rendered in JavaScript using XPath?

问题 How do I get a <td> with a specific class name using XPath and Nokogiri? Tables are nested and some of them don't have IDs or classes, so I can't nest stuff like this: //table/tbody/tr/td Here is what I have so far: doc = Nokogiri::HTML(open("http://www.goalzz.com/default.aspx?c=8358")) doc.xpath('//td[@class="m_g"]').each do |node| pp node.to_s end Any ideas? There are few <td> s with that class name and I want to get all of them. 回答1: Using gem "capybara-webkit" is a viable way of

open xml file with nokogiri update node and save

阅读更多关于 open xml file with nokogiri update node and save

问题 I'm trying to figure out how to open an xml file, search by an id, replace a value in the node and then resave the document. my xml <?xml version="1.0"?> <data> <user id="1370018670618"> <email>1@1.com</email> <sent>false</sent> </user> <user id="1370018701357"> <email>2@2.com</email> <sent>false</sent> </user> <user id="1370018769724"> <email>3@3.com</email> <sent>false</sent> </user> <user id="1370028546850"> <email>4@4.com</email> <sent>false</sent> </user> <user id="1370028588345"> <email

Reading malformed XML with Nokogiri: Unescaped Ampersands in URL field

阅读更多关于 Reading malformed XML with Nokogiri: Unescaped Ampersands in URL field

问题 I am trying to read a XML file from a third party with Nokogiri in my rails project. One of the nodes I have ot parse contains an URL with unescaped ampersands (like foo.com/index.html?page=1&query=bar ) I understand that this is considered malformed XML, and Nokogiri just tries to parse it anyway, resulting in foo.com/index.html?page=1=bar . How can I obtain the full URL? Can I tweak Nokogiri? Would you do a search&replace-prerun or what would be the best practice? 回答1: Had the same issue

Can Nokogiri interpret javascript? - Web Scraping

阅读更多关于 Can Nokogiri interpret javascript? - Web Scraping

We are trying to scrape the availabilities on this page: http://www.equityapartments.com/new-york/new-york-city-apartments/midtown-west/mantena-apartments.aspx I need to use my spider to select on the "All Floorplans" and fetch all the availabilities. But the data are actually sent through Javascript request I believe. Is there a way for my Nokogiri spider to render it? Or maybe simulate the process of clicking on buttons? Nokogiri is just a parser . It also allows to search content. To interact with web pages you need to use something else, e.g. Watir and PhantomJS . Combining them all

Can Nokogiri interpret javascript? - Web Scraping

阅读更多关于 Can Nokogiri interpret javascript? - Web Scraping

问题 We are trying to scrape the availabilities on this page: http://www.equityapartments.com/new-york/new-york-city-apartments/midtown-west/mantena-apartments.aspx I need to use my spider to select on the "All Floorplans" and fetch all the availabilities. But the data are actually sent through Javascript request I believe. Is there a way for my Nokogiri spider to render it? Or maybe simulate the process of clicking on buttons? 回答1: Nokogiri is just a parser. It also allows to search content. To