nokogiri

Building blank XML tags with Nokogiri?

限于喜欢 提交于 2019-12-06 03:49:36
问题 I'm trying to build up an XML document using Nokogiri. Everything is pretty standard so far; most of my code just looks something like: builder = Nokogiri::XML::Builder.new do |xml| ... xml.Tag1(object.attribute_1) xml.Tag2(object.attribute_2) xml.Tag3(object.attribute_3) xml.Tag4(nil) end builder.to_xml However, that results in a tag like <Tag4/> instead of <Tag4></Tag4> , which is what my end user has specified that the output needs to be. How do I tell Nokogiri to put full tags around a

Nokogiri html parsing question

血红的双手。 提交于 2019-12-06 02:47:16
问题 I'm having trouble figuring out why I can't get keywords to parse properly through nokogiri. In the following example, I have the a href link text functionality working properly but cannot figure out how to pull the keywords. This is the code I have thus far: ..... doc = Nokogiri::HTML(open("http://www.cnn.com")) doc.xpath('//a/@href').each do |node| #doc.xpath("//meta[@name='Keywords']").each do |node| puts node.text .... This successfully renders all of the a href text in the page, but when

Nokogiri Error: undefined method `radiobutton_with' - Why?

百般思念 提交于 2019-12-06 00:10:20
I try to access a form using mechanize (Ruby). On my form I have a gorup of Radiobuttons. So I want to check one of them. I wrote: target_form = (page/:form).find{ |elem| elem['id'] == 'formid'} target_form.radiobutton_with(:name => "radiobuttonname")[2].check In this line I want to check the radiobutton with the value of 2. But in this line, I get an error: : undefined method `radiobutton_with' for #<Nokogiri::XML::Element:0x9b86ea> (NoMethodError) The problem occured because using a Mechanize page as a Nokogiri document (by calling the / method, or search , or xpath , etc.) returns Nokogiri

get div nested in div element using Nokogiri

。_饼干妹妹 提交于 2019-12-05 20:46:05
问题 For following HTML, I want to parse it and get following result using Nokogiri. event_name = "folk concert 2" event_link = "http://www.douban.com/event/12761580/" event_date = "20th,11,2010" I know doc.xpath('//div[@class="nof clearfix"]') could get each div element, but how should I proceed to get each attribution like event_name , and especially the date ? HTML <div class="nof clearfix"> <h2><a href="http://www.douban.com/event/12761580/">folk concert 2</a> <span class="pl2"> </span></h2>

Screen scraping in clojure

给你一囗甜甜゛ 提交于 2019-12-05 19:38:54
I googled, but I can't find a satisfactory answer. This SO question is related but kinda old as well as the exact opposite of what I am looking for: a way to do screen-scraping using XPath, not CSS selectors. I've used enlive for some basic screen-scraping but sometimes one needs the power of XPath selectors. So here it is: Is there any equivalent to Nokogiri or lxml for clojure (java)? What is the state of the "pure java Nokogiri"? Any way to use the library from clojure? Any better alternatives than this hack ? There are a couple of possibilities here. Several of these require semi-well

Image scraping in Ruby

自作多情 提交于 2019-12-05 19:27:53
How do I scrape an image present on a particular URL using Nokogiri? If there are better options than Nokogiri please suggest. The css image tag is .profilePic img Phrogz If it is just an <img> with a URL: PAGE = "http://site.com/page.html" require 'nokogiri' require 'open-uri' html = Nokogiri.HTML(open(PAGE)) src = html.at('.profilePic img')['src'] File.open("foo.png", "wb") do |f| f.write(open(src).read) end If you need to turn a relative image path into an absolute, see: https://stackoverflow.com/a/4864170/405017 The lazy way is to use mechanize as it will figure out the urls and filenames

How can i put a string with an ampersand in an xml file with Nokogiri?

余生长醉 提交于 2019-12-05 19:19:28
I'm trying to include a URL to an image in an XML file, and the ampersands in the URL query string are getting stripped out: bgdoc.xpath('//Master').each do |elem| part = elem.xpath('Part').inner_text image = imagehash[part] image = "" if image.blank? elem.xpath('Image').first.content = "<![CDATA[#{image}]]>" puts elem.xpath('Image').first.content end bgdoc is getting written out with the help of Builder later on. But not the individual elements, it's getting inserted all at once. That makes it a different case than a similar question posted on SO. You should be using create_cdata to create a

How to tidy up malformed xml in ruby

怎甘沉沦 提交于 2019-12-05 18:59:30
I'm having issues tidying up malformed XML code I'm getting back from the SEC's edgar database . For some reason they have horribly formed xml. Tags that contain any sort of string aren't closed and it can actually contain other xml or html documents inside other tags. Normally I'd had this off to Tidy but that isn't being maintained. I've tried using Nokogiri::XML::SAX::Parser but that seems to choke because the tags aren't closed. It seems to work alright until it hits the first ending tag and then it doesn't fire off on any more of them. But it is spiting out the right characters. class

Is there something similar to Nokogiri for parsing Ruby code?

牧云@^-^@ 提交于 2019-12-05 18:37:13
问题 Nokogiri is awesome. I can do things like #css('.bla') which will return the first matching element. Right now we need to do some parsing of Ruby source code - finding all methods within a class etc. We're using the ruby_parser gem, but all it does is comb your source code and spit out S-expressions. Is there anything like Nokogiri for these S-expressions which can do things like "return S-expression for first method found named 'foo'"? 回答1: The only thing I can think of, is Adam Sanderson's

How can I add a child to a node at a specific position?

倖福魔咒の 提交于 2019-12-05 15:04:16
问题 I have a node which has two children: an HTML text and an HTML element. <h1 id="Installation-blahblah">Installation on server<a href="#Installation-blah" class="wiki-anchor">¶</a> </h1> In this case the HTML text is: Installation on server and the HTML element: <a href="#Installation-blah" class="wiki-anchor">anchor;</a> I then create a node like this: span_node = Nokogiri::HTML::Node.new('span',doc) span_node['class'] = 'edit-section' link_node = Nokogiri::HTML::Node.new('a',doc) link_node[