nokogiri

How can I make empty tags self-closing with Nokogiri?

你离开我真会死。 提交于 2019-12-01 18:33:05
I've created an XML template in ERB. I fill it in with data from a database during an export process. In some cases, there is a null value, in which case an element may be empty, like this: <someitem> </someitem> In that case, the client receiving the export wants it to be converted into a self-closing tag: <someitem/> I'm trying to see how to get Nokogiri to do this, but I don't see it yet. Does anybody know how to make empty XML tags self-closing with Nokogiri? Update A regex was sufficient to do what I specified above, but the client now also wants tags whose children are all empty to be

How do I get innerHtml using Nokogiri gem

安稳与你 提交于 2019-12-01 15:43:18
问题 e.g. i have html: <div class="item"> <p> bla bla<br/> bla bla </p> </div> i need to get inner html of div.item: <p> bla bla<br/> bla bla </p> i know that i can use: doc.css("div.item:first").text text method return clean text without any html tags but what should i do for getting inner html of div.item? tried: doc.css("div.item:first").html but doesn't work, documentation did not help as well any ideas? 回答1: If you just need the string: doc.css("div.item:first").inner_html 来源: https:/

Nokogiri fails to install on OS X

ぐ巨炮叔叔 提交于 2019-12-01 14:34:26
There are many posts on this issue, however, there can be a couple of reasons why Nokogiri (version 1.6.x, 1.7.x or 1.8.x) fails to install on OS X. Related articles: OS X 10.6 Installing Nokogiri cann't install nokogiri 1.6.1 on Mac OS X 10.9 Maveriks Command Line not installed: This is the easiest one to check and fix: run xcode-select --install in a terminal window to install it, then try installing Nokogiri one more time by running gem install nokogiri . Some are reporting that they used gem install nokogiri -- --use-system-libraries Libxml, libxlt are too recent: This situation is due to

Nokogiri fails to install on OS X

微笑、不失礼 提交于 2019-12-01 12:45:11
问题 There are many posts on this issue, however, there can be a couple of reasons why Nokogiri (version 1.6.x, 1.7.x or 1.8.x) fails to install on OS X. Related articles: OS X 10.6 Installing Nokogiri cann't install nokogiri 1.6.1 on Mac OS X 10.9 Maveriks 回答1: Command Line not installed: This is the easiest one to check and fix: run xcode-select --install in a terminal window to install it, then try installing Nokogiri one more time by running gem install nokogiri . Some are reporting that they

Remove a tag but keep the text

流过昼夜 提交于 2019-12-01 12:44:56
So I have this <a> tag in a xml file <a href="/www.somethinggggg.com">Something 123</a> My desired result is to use Nokogiri and completely remove its tag so it is no longer a clickable link e.g Something 123 My attempt: content = Nokogiri::XML.fragment(page_content) content.search('.//a').remove But this removes the text too. Any suggestions on how to achieve my desired result using Nokogiri? Here is what I would do : require 'nokogiri' doc = Nokogiri::HTML.parse <<-eot <a href="/www.somethinggggg.com">Something 123</a> eot node = doc.at("a") node.replace(node.text) puts doc.to_html output <

How do I access HTML elements that are rendered in JavaScript using XPath?

痴心易碎 提交于 2019-12-01 12:00:03
问题 How do I get a <td> with a specific class name using XPath and Nokogiri? Tables are nested and some of them don't have IDs or classes, so I can't nest stuff like this: //table/tbody/tr/td Here is what I have so far: doc = Nokogiri::HTML(open("http://www.goalzz.com/default.aspx?c=8358")) doc.xpath('//td[@class="m_g"]').each do |node| pp node.to_s end Any ideas? There are few <td> s with that class name and I want to get all of them. 回答1: Using gem "capybara-webkit" is a viable way of

open xml file with nokogiri update node and save

让人想犯罪 __ 提交于 2019-12-01 11:12:02
问题 I'm trying to figure out how to open an xml file, search by an id, replace a value in the node and then resave the document. my xml <?xml version="1.0"?> <data> <user id="1370018670618"> <email>1@1.com</email> <sent>false</sent> </user> <user id="1370018701357"> <email>2@2.com</email> <sent>false</sent> </user> <user id="1370018769724"> <email>3@3.com</email> <sent>false</sent> </user> <user id="1370028546850"> <email>4@4.com</email> <sent>false</sent> </user> <user id="1370028588345"> <email

Reading malformed XML with Nokogiri: Unescaped Ampersands in URL field

廉价感情. 提交于 2019-12-01 08:30:37
问题 I am trying to read a XML file from a third party with Nokogiri in my rails project. One of the nodes I have ot parse contains an URL with unescaped ampersands (like foo.com/index.html?page=1&query=bar ) I understand that this is considered malformed XML, and Nokogiri just tries to parse it anyway, resulting in foo.com/index.html?page=1=bar . How can I obtain the full URL? Can I tweak Nokogiri? Would you do a search&replace-prerun or what would be the best practice? 回答1: Had the same issue

Can Nokogiri interpret javascript? - Web Scraping

折月煮酒 提交于 2019-12-01 07:35:59
We are trying to scrape the availabilities on this page: http://www.equityapartments.com/new-york/new-york-city-apartments/midtown-west/mantena-apartments.aspx I need to use my spider to select on the "All Floorplans" and fetch all the availabilities. But the data are actually sent through Javascript request I believe. Is there a way for my Nokogiri spider to render it? Or maybe simulate the process of clicking on buttons? Nokogiri is just a parser . It also allows to search content. To interact with web pages you need to use something else, e.g. Watir and PhantomJS . Combining them all

Can Nokogiri interpret javascript? - Web Scraping

自古美人都是妖i 提交于 2019-12-01 05:06:00
问题 We are trying to scrape the availabilities on this page: http://www.equityapartments.com/new-york/new-york-city-apartments/midtown-west/mantena-apartments.aspx I need to use my spider to select on the "All Floorplans" and fetch all the availabilities. But the data are actually sent through Javascript request I believe. Is there a way for my Nokogiri spider to render it? Or maybe simulate the process of clicking on buttons? 回答1: Nokogiri is just a parser. It also allows to search content. To