nokogiri | 易学教程

Building blank XML tags with Nokogiri?

阅读更多关于 Building blank XML tags with Nokogiri?

问题 I'm trying to build up an XML document using Nokogiri. Everything is pretty standard so far; most of my code just looks something like: builder = Nokogiri::XML::Builder.new do |xml| ... xml.Tag1(object.attribute_1) xml.Tag2(object.attribute_2) xml.Tag3(object.attribute_3) xml.Tag4(nil) end builder.to_xml However, that results in a tag like <Tag4/> instead of <Tag4></Tag4> , which is what my end user has specified that the output needs to be. How do I tell Nokogiri to put full tags around a

Nokogiri html parsing question

阅读更多关于 Nokogiri html parsing question

问题 I'm having trouble figuring out why I can't get keywords to parse properly through nokogiri. In the following example, I have the a href link text functionality working properly but cannot figure out how to pull the keywords. This is the code I have thus far: ..... doc = Nokogiri::HTML(open("http://www.cnn.com")) doc.xpath('//a/@href').each do |node| #doc.xpath("//meta[@name='Keywords']").each do |node| puts node.text .... This successfully renders all of the a href text in the page, but when

Nokogiri Error: undefined method `radiobutton_with' - Why?

阅读更多关于 Nokogiri Error: undefined method `radiobutton_with' - Why?

I try to access a form using mechanize (Ruby). On my form I have a gorup of Radiobuttons. So I want to check one of them. I wrote: target_form = (page/:form).find{ |elem| elem['id'] == 'formid'} target_form.radiobutton_with(:name => "radiobuttonname")[2].check In this line I want to check the radiobutton with the value of 2. But in this line, I get an error: : undefined method `radiobutton_with' for #<Nokogiri::XML::Element:0x9b86ea> (NoMethodError) The problem occured because using a Mechanize page as a Nokogiri document (by calling the / method, or search , or xpath , etc.) returns Nokogiri

get div nested in div element using Nokogiri

阅读更多关于 get div nested in div element using Nokogiri

问题 For following HTML, I want to parse it and get following result using Nokogiri. event_name = "folk concert 2" event_link = "http://www.douban.com/event/12761580/" event_date = "20th,11,2010" I know doc.xpath('//div[@class="nof clearfix"]') could get each div element, but how should I proceed to get each attribution like event_name , and especially the date ? HTML <div class="nof clearfix"> <h2><a href="http://www.douban.com/event/12761580/">folk concert 2</a> <span class="pl2"> </span></h2>

Screen scraping in clojure

阅读更多关于 Screen scraping in clojure

I googled, but I can't find a satisfactory answer. This SO question is related but kinda old as well as the exact opposite of what I am looking for: a way to do screen-scraping using XPath, not CSS selectors. I've used enlive for some basic screen-scraping but sometimes one needs the power of XPath selectors. So here it is: Is there any equivalent to Nokogiri or lxml for clojure (java)? What is the state of the "pure java Nokogiri"? Any way to use the library from clojure? Any better alternatives than this hack ? There are a couple of possibilities here. Several of these require semi-well

Image scraping in Ruby

阅读更多关于 Image scraping in Ruby

How do I scrape an image present on a particular URL using Nokogiri? If there are better options than Nokogiri please suggest. The css image tag is .profilePic img Phrogz If it is just an <img> with a URL: PAGE = "http://site.com/page.html" require 'nokogiri' require 'open-uri' html = Nokogiri.HTML(open(PAGE)) src = html.at('.profilePic img')['src'] File.open("foo.png", "wb") do |f| f.write(open(src).read) end If you need to turn a relative image path into an absolute, see: https://stackoverflow.com/a/4864170/405017 The lazy way is to use mechanize as it will figure out the urls and filenames

How can i put a string with an ampersand in an xml file with Nokogiri?

阅读更多关于 How can i put a string with an ampersand in an xml file with Nokogiri?

I'm trying to include a URL to an image in an XML file, and the ampersands in the URL query string are getting stripped out: bgdoc.xpath('//Master').each do |elem| part = elem.xpath('Part').inner_text image = imagehash[part] image = "" if image.blank? elem.xpath('Image').first.content = "<![CDATA[#{image}]]>" puts elem.xpath('Image').first.content end bgdoc is getting written out with the help of Builder later on. But not the individual elements, it's getting inserted all at once. That makes it a different case than a similar question posted on SO. You should be using create_cdata to create a

How to tidy up malformed xml in ruby

阅读更多关于 How to tidy up malformed xml in ruby

I'm having issues tidying up malformed XML code I'm getting back from the SEC's edgar database . For some reason they have horribly formed xml. Tags that contain any sort of string aren't closed and it can actually contain other xml or html documents inside other tags. Normally I'd had this off to Tidy but that isn't being maintained. I've tried using Nokogiri::XML::SAX::Parser but that seems to choke because the tags aren't closed. It seems to work alright until it hits the first ending tag and then it doesn't fire off on any more of them. But it is spiting out the right characters. class

Is there something similar to Nokogiri for parsing Ruby code?

阅读更多关于 Is there something similar to Nokogiri for parsing Ruby code?

问题 Nokogiri is awesome. I can do things like #css('.bla') which will return the first matching element. Right now we need to do some parsing of Ruby source code - finding all methods within a class etc. We're using the ruby_parser gem, but all it does is comb your source code and spit out S-expressions. Is there anything like Nokogiri for these S-expressions which can do things like "return S-expression for first method found named 'foo'"? 回答1: The only thing I can think of, is Adam Sanderson's

How can I add a child to a node at a specific position?

阅读更多关于 How can I add a child to a node at a specific position?

问题 I have a node which has two children: an HTML text and an HTML element. <h1 id="Installation-blahblah">Installation on server<a href="#Installation-blah" class="wiki-anchor">¶</a> </h1> In this case the HTML text is: Installation on server and the HTML element: <a href="#Installation-blah" class="wiki-anchor">anchor;</a> I then create a node like this: span_node = Nokogiri::HTML::Node.new('span',doc) span_node['class'] = 'edit-section' link_node = Nokogiri::HTML::Node.new('a',doc) link_node[