nokogiri | 易学教程

How to convert Linkedin client in rails to Hash

阅读更多关于 How to convert Linkedin client in rails to Hash

问题 I create linkedin client like this. client = LinkedIn::Client.new("3333", "rrrrrrr") client.authorize_from_access(session[:atoken], session[:asecret]) and get profile information like this @profile = client.profile when i print the profile like " puts client.profile ", i get the following out put #<LinkedIn::Profile:0x4a77770 @doc=#<Nokogiri::XML::Document:0x253bb64 name="document" children=[#<Nokogiri::XML::Element:0x253b9fc name="pers on" children=[#<Nokogiri::XML::Text:0x253b87c "\n ">, #

Display data scraped from Nokogiri in Rails?

阅读更多关于 Display data scraped from Nokogiri in Rails?

问题 I recently started learning Rails, and am trying to build a simple application which scrapes football fixtures from a website and displays the data in my index.html. Users can then try to predict the scoreline of the fixtures. I managed to scrape the data into a fixtures.rb file using Nokogiri: require 'nokogiri' require 'open-uri' doc = Nokogiri::HTML(open("http://www.bbc.co.uk/sport/0/football/21784836")) doc.css("tr.row2").each do |item| puts item.at_css("td.left.first p").text end What

how to get horizontal depth of a node?

阅读更多关于 how to get horizontal depth of a node?

问题 note i made up the term horizontal depth to measure the sub-dimension of a node within a tree. so imagine a which would have xpath something like /html/table/tbody/tr/td, and "horizontal depth" of 5 i am trying to see if there is a way to identify and select elements based on this horizontal depth. how can i find the maximum depth ? 回答1: If you need all the nodes with depth >= 5: /*/*/*/*//* And if you need all the nodes with depth == 5: /*/*/*/*/* Actually, there is a XPath function count ,

How can i put a string with an ampersand in an xml file with Nokogiri?

阅读更多关于 How can i put a string with an ampersand in an xml file with Nokogiri?

问题 I'm trying to include a URL to an image in an XML file, and the ampersands in the URL query string are getting stripped out: bgdoc.xpath('//Master').each do |elem| part = elem.xpath('Part').inner_text image = imagehash[part] image = "" if image.blank? elem.xpath('Image').first.content = "<![CDATA[#{image}]]>" puts elem.xpath('Image').first.content end bgdoc is getting written out with the help of Builder later on. But not the individual elements, it's getting inserted all at once. That makes

Nokogiri equivalent of jQuery closest() method for finding first matching ancestor in tree

阅读更多关于 Nokogiri equivalent of jQuery closest() method for finding first matching ancestor in tree

问题 jQuery has a lovely if somewhat misnamed method called closest() that walks up the DOM tree looking for a matching element. For example, if I've got this HTML: <table src="foo"> <tr> <td>Yay</td> </tr> </table> Assuming element is set to <td> , then I can figure the value of src like this: element.closest('table')['src'] And that will cleanly return "undefined" if either of the table element or its src attribute are missing. Having gotten used to this in Javascriptland, I'd love to find

Why can't I load Nokogiri?

阅读更多关于 Why can't I load Nokogiri?

问题 I installed Nokogiri without any issues by running: $ sudo gem install nokogiri Building native extensions. This could take a while... Successfully installed nokogiri-1.5.9 1 gem installed Installing ri documentation for nokogiri-1.5.9... Installing RDoc documentation for nokogiri-1.5.9... When I run nokogiri.rb: #!/usr/bin/ruby -w require 'nokogiri' puts "Current directory is: #{ Dir.pwd }" Dir.chdir("/home/askar/xml_files1") do |dirname| puts "Now in: #{ Dir.pwd }" xml_files = Dir.glob(

Cleaning HTML with Nokogiri (instead of Tidy)

阅读更多关于 Cleaning HTML with Nokogiri (instead of Tidy)

问题 The tidy gem is no longer maintained and has multiple memory leak issues. Some people suggested using Nokogiri. I'm currently cleaning the HTML using: Nokogiri::HTML::DocumentFragment.parse(html).to_html I've got two issues though: Nokogiri removes the DOCTYPE Is there an easy way to force the cleaned HTML to have a html and body tag? 回答1: If you are processing a full document, you want: Nokogiri::HTML(html).to_html That will force html and body tags, and introduce or preserve the DOCTYPE :

Inserting and deleting XML nodes and elements using Nokogiri

阅读更多关于 Inserting and deleting XML nodes and elements using Nokogiri

问题 I want to extract parts of an XML file and make a note that I extracted some part in that file, like "here something was extracted". I'm trying to do this with Nokogiri, but it seems to not really be documented on how to: delete all childs of a <Nokogiri::XML::Element> change the inner_text of that complete element Any clues? 回答1: Nokogiri makes this pretty easy. Using this document as an example, the following code will find all vitamins tags, remove their children (and the children's

Following a link using Nokogiri for scraping

阅读更多关于 Following a link using Nokogiri for scraping

问题 Is there a method to follow a link using Nokogiri for scraping? I know I can extract the href and open it, but I thought I saw a method to do this using hpricot and was wondering if there was something like that in Nokogiri. 回答1: Here is an excellent screen scraping guide for using Ruby, Nokigiri, Hpricot, and Firebug. Personally I am a big fan of using Mechanize, which is a headless browser, for screen scraping. You can use mechanize to navigate links and fill out forms and it will handle

I need to scrape data from a facebook game - using ruby

阅读更多关于 I need to scrape data from a facebook game - using ruby

问题 Revised (clarified question) I've spent a few days already trying to figure out how to scrape specific information from a facebook game; however, I've run into brick wall after brick wall. As best as I can tell, the main problem is as follows. I can use Chrome's inspect element tool to manually find the html that I need - it appears nestled inside an iframe. However, when I try and scrape that iframe, it is empty (except for properties): <iframe id="game_frame" name="game_frame" src=""