nokogiri

Getting nokogiri to use a newer version of libxml2

╄→гoц情女王★ 提交于 2019-12-04 07:39:50
I've been trying to get Nokogiri installed on my computer (Mountain Lion) to use with rspec and capybara, but for the life of me, I can't get it to run properly. From what I can tell, the issue is with nokogiri using the wrong version of libxml2. I've so far tried uninstalling and reinstalling libxml2 using Homebrew (making sure it's the most recent one), uninstalling and reinstalling nokogiri using bundle, and specifying the exact path to the libxml2 files that Homebrew installed when installing the nokogiri gem. My most recent install instructions looked like this sudo gem install nokogiri -

How to scrape pages which have lazy loading

北慕城南 提交于 2019-12-04 07:13:29
Here is the code which i used for parsing of web page.I did it in rails console.But i am not getting any output in my rails console.The site which i want to scrape is having lazy loading require 'nokogiri' require 'open-uri' page = 1 while true url = "http://www.justdial.com/functions"+"/ajxsearch.php?national_search=0&act=pagination&city=Delhi+%2F+NCR&search=Pandits"+"&where=Delhi+Cantt&catid=1195&psearch=&prid=&page=#{page}" doc = Nokogiri::HTML(open(url)) doc = Nokogiri::HTML(doc.at_css('#ajax').text) d = doc.css(".rslwrp") d.each do |t| puts t.css(".jrcw").text puts t.css("span.jcn").text

How to use nokogiri from Jruby on Windows?

心已入冬 提交于 2019-12-04 05:14:52
问题 I'm getting the following error when trying to use Nokogiri with Jruby on Windows 7 D:\code\h4>jruby -e "require 'rubygems'; require 'nokogiri'" D:/jruby-1.3.1/bin/../lib/ruby/1.8/ffi/library.rb:18:in `ffi_lib': Could not ope n any of [xml2, xslt, exslt] (LoadError) from D:/jruby-1.3.1/lib/ruby/gems/1.8/gems/nokogiri-1.3.3-java/lib/nokog iri/ffi/libxml.rb:5 from D:/jruby-1.3.1/lib/ruby/gems/1.8/gems/nokogiri-1.3.3-java/lib/nokog iri/ffi/libxml.rb:31:in `require' from D:/jruby-1.3.1/bin/../lib

HTML is read before fully loaded using open-uri and nokogiri

笑着哭i 提交于 2019-12-04 04:30:13
I'm using open-uri and nokogiri with ruby to do some simple webcrawling. There's one problem that sometimes html is read before it is fully loaded. In such cases, I cannot fetch any content other than the loading-icon and the nav bar. What is the best way to tell open-uri or nokogiri to wait until the page is fully loaded? Currently my script looks like: require 'nokogiri' require 'open-uri' url = "https://www.the-page-i-wanna-crawl.com" doc = Nokogiri::HTML(open(url, ssl_verify_mode: OpenSSL::SSL::VERIFY_NONE)) puts doc.at_css("h2").text What you describe is not possible. The result of open

Can I incorporate system libraries (e.g. libxml2) I compile against into a gem (e.g. nokogiri) that I can deploy to Heroku?

这一生的挚爱 提交于 2019-12-04 04:10:26
Nokogiri has a problem with translating to and from UTF-8 characters that turns out to come from libxml2, specifically version 2.7.6, which is the highest supported version on Ubuntu 10.04 LTS. The bug is fixed in version 2.7.7 and up, but since our app is hosted on Heroku (bamboo-ree-1.8.7 stack, based on Ubuntu 10.04), we have to use version 2.7.6, and continue to experience the bug, unless: Someone can/has hacked nokogiri to get around the problem Canonical bumps the supported libxml2 version for Ubuntu 10.04 (and/or Heroku updates libxml2 in their stack) I can come up with a way for

Is there something similar to Nokogiri for parsing Ruby code?

允我心安 提交于 2019-12-04 03:57:27
Nokogiri is awesome. I can do things like #css('.bla') which will return the first matching element. Right now we need to do some parsing of Ruby source code - finding all methods within a class etc. We're using the ruby_parser gem, but all it does is comb your source code and spit out S-expressions. Is there anything like Nokogiri for these S-expressions which can do things like "return S-expression for first method found named 'foo'"? The only thing I can think of, is Adam Sanderson's SExpPath library . Although I am accepting Jörg's answer because it is more complete, I ended up discovering

get div nested in div element using Nokogiri

心已入冬 提交于 2019-12-04 03:15:05
For following HTML, I want to parse it and get following result using Nokogiri. event_name = "folk concert 2" event_link = "http://www.douban.com/event/12761580/" event_date = "20th,11,2010" I know doc.xpath('//div[@class="nof clearfix"]') could get each div element, but how should I proceed to get each attribution like event_name , and especially the date ? HTML <div class="nof clearfix"> <h2><a href="http://www.douban.com/event/12761580/">folk concert 2</a> <span class="pl2"> </span></h2> <div class="pl intro"> Date:25th,11,2010<br/> </div> </div> <div class="nof clearfix"> <h2><a href="http

How can I add a child to a node at a specific position?

大憨熊 提交于 2019-12-04 01:36:45
I have a node which has two children: an HTML text and an HTML element. <h1 id="Installation-blahblah">Installation on server<a href="#Installation-blah" class="wiki-anchor">¶</a> </h1> In this case the HTML text is: Installation on server and the HTML element: <a href="#Installation-blah" class="wiki-anchor">anchor;</a> I then create a node like this: span_node = Nokogiri::HTML::Node.new('span',doc) span_node['class'] = 'edit-section' link_node = Nokogiri::HTML::Node.new('a',doc) link_node['href'] = "/wiki/#{page_id}/#{@page.title}/edit?section=#{section_index}" link_node['class'] = 'icon

Nokogiri leaving HTML entities untouched

一个人想着一个人 提交于 2019-12-04 00:49:52
I want Nokogiri to leave HTML entities untouched, but it seems to be converting the entities into the actual symbol. For example: Nokogiri::HTML.fragment('<p>®</p>').to_s results in: "<p>®</p>" Nothing seems to return the original HTML back to me. The .inner_html, .text, .content methods all return '®' instead of '®' Is there a way for Nokogiri to leave these HTML entities untouched? I've already searched stackoverflow and found similar questions, but nothing exactly like this one. Not an ideal answer, but you can force it to generate entities (if not nice names) by setting the allowed

How to add attribute to Nokogiri node?

蹲街弑〆低调 提交于 2019-12-03 22:31:38
I'm trying to add an attribute to an existing Nokogiri node. What I've done is this: node.attributes['foobar'] = Nokogiri::XML::Attr.new('foo', 'bar') But I get the error: TypeError Exception: wrong argument type String (expected Data) What is a Data data type, and how do I add an attribute to the Nokogiri object? Thanks! I believe you should just need to use the []= method , i.e. node['foo'] = 'bar' You could also use node.set_attribute('foo', 'bar') . 来源: https://stackoverflow.com/questions/3614458/how-to-add-attribute-to-nokogiri-node