nokogiri

Data scraping multiple array creation and ordering

橙三吉。 提交于 2019-12-11 12:57:53
问题 We're trying to scrape the course names, qualification and duration of the course and store each in a separate array. With the below we pull all of that, but it seems to be in random order, with some parts potentially ordered by page etc. Wondering if anybody is able to help. require 'mechanize' mechanize = Mechanize.new @duration_array = [] @qual_array = [] @courses_array = [] page = mechanize.get('http://search.ucas.com/search/results?Vac=2&AvailableIn=2016&IsFeatherProcessed=True&page=1

Parsing XML to hash with Nori and Nokogiri with undesired result

限于喜欢 提交于 2019-12-11 11:42:01
问题 I am attempting to convert an XML document to a Ruby hash using Nori. But instead of receiving a collection of the root element, a new node containing the collection is returned. This is what I am doing: @xml = content_for(:layout) @hash = Nori.new(:parser => :nokogiri, :advanced_typecasting => false).parse(@xml) or @hash = Hash.from_xml(@xml) Where the content of @xml is: <bundles> <bundle> <id>6073</id> <name>Bundle-1</name> <status>1</status> <bundle_type> <id>6713</id> <name>BundleType-1<

Find a table containing specific text

旧街凉风 提交于 2019-12-11 11:00:09
问题 I have a table: html =' <table cellpadding="1" cellspacing="0" width="100%" border="0"> <tr> <td colspan="9" class="csoGreen"><b class="white">Bill Statement Detail</b></td> </tr> <tr style="background-color: #D8E4F6;vertical-align: top;"> <td nowrap="nowrap"><b>Bill Date</b></td> <td nowrap="nowrap"><b>Bill Amount</b></td> <td nowrap="nowrap"><b>Bill Due Date</b></td> <td nowrap="nowrap"><b>Bill (PDF)</b></td> </tr> </table> ' I use the codes suggested in this post (XPath matching text in a

Parsing: Can I pick up the URL of embedded CSS Background in Nokogiri?

杀马特。学长 韩版系。学妹 提交于 2019-12-11 10:38:45
问题 The HTML I am parsing contains images with inline CSS in a table, can I use Nokogiri to determine the URL component is, here is a snippet of code I'd like to parse: tldr: i'ld like to get the .png in this html snippet using nokogiri <table border="0" cellspacing="0" cellpadding="0" width="300" height="300" background="http://s3.amazonaws.com/static.example.com/sale/homepage/3166-300x300-1328107072.png" style="background-image:url('http://s3.amazonaws.com/static.example.com/sale/homepage/3166

Nokogiri HTML parsing not working

为君一笑 提交于 2019-12-11 10:28:32
问题 I am trying to parse some HTML with Nokogiri, but I am not getting anything back from the css or xpath methods. require 'rubygems' require 'open-uri' require 'nokogiri' doc = Nokogiri::HTML(open("http://www.google.com")) doc.css('div').each do |div| puts div.content end doc.xpath('//div').each do |div| puts div.content end Nothing gets printed to the screen, so css and xpath are returning empty arrays. There are at least 100 divs in Google's homepage. doc.to_html returns: <!DOCTYPE html>\n\n

Get value from a HTTP GET response body via Nokogiri?

泄露秘密 提交于 2019-12-11 10:26:11
问题 I get this result from a HTTP page like: <!DOCTYPE html> <html> <head> <title>Captchaservice</title> </head> <body> 15 </body> </html> And I use this Nokogiri code: doc = Nokogiri::HTML( response ) id = doc.xpath('//').text But I get \n 15 \n etc. I tried to write: id = doc.xpath('//').text.to_i And I get this value, but when I use this ID I get: undefined method `empty?' for 15:Fixnum What am I doing wrong and how do I to get this integer value? 回答1: That's because your id is an instance of

Cannot installing mechanize for ruby on mac

◇◆丶佛笑我妖孽 提交于 2019-12-11 09:58:44
问题 I am trying to install mechanize on a Mac OS X Version 10.7.3 with ruby version 1.8.7. The problem is with one of its dependencies nokogiri. I have seen other posts about having xcode installe and I do it is version 4.3.2 . Here is the error I am receiving. Thank you in advance. sudo gem install mechanize Building native extensions. This could take a while... ERROR: Error installing mechanize: ERROR: Failed to build gem native extension. /System/Library/Frameworks/Ruby.framework/Versions/1.8

:has CSS pseudo class in Nokogiri

一个人想着一个人 提交于 2019-12-11 09:34:51
问题 I'm looking for the pseudoclass :has in Nokogiri. It should work just like jQuery's has selector. For example: <li><h1><a href="dfd">ex1</a></h1><span class="string">sdfsdf</span></li> <li><h1><a href="dsfsdf">ex2</a></h1><span class="string"></span></li> <li><h1><a href="sdfd">ex3</a></h1></li> The CSS selector should return only the first link, the one with the not-empty span.string sibling. In jQuery this selector works well: $('li:has(span.string:not(:empty))>h1>a') but not in Nokogiri:

Nokogiri for selecting text and html between between unique sets of tags

白昼怎懂夜的黑 提交于 2019-12-11 09:17:30
问题 I am trying to use Nokogiri to extract the text in-between two unique sets of tags. What is the best way to get the text within the p-tag in between <h2 class="point">The problem</h2> and <h2 class="point">The solution</h2> , and then all of the HTML between <h2 class="point">The solution</h2> and <div class="frame box sketh"> ? Sample of the full html: <h2 class="point">The problem</h2> <p>TEXT I WANT </p> <h2 class="point">The solution</h2> HTML I WANT with it's own set of tags (but never

How do I scrape data through Mechanize and Nokogiri?

不羁的心 提交于 2019-12-11 09:13:53
问题 I am working on an application which gets the HTML from http://www.screener.in/. I can enter a company name like "Atul Auto Ltd" and submit it and, from the next page, scrape the following details: "CMP/BV" and "CMP". I am using this code: require 'mechanize' require 'rubygems' require 'nokogiri' Company_name='Atul Auto Ltd.' agent = Mechanize.new page = agent.get('http://www.screener.in/') form = agent.page.forms[0] print agent.page.forms[0].fields agent.page.forms[0]["q"]=Company_name