nokogiri | 易学教程

Generating XML with cdata using Ox?

阅读更多关于 Generating XML with cdata using Ox?

I need to generate XML using ox but didn't get much help from the documentation. I need to generate XML like this: <Jobpostings> <Postings> <Posting> <JobTitle><cdata>Programmer Analyst 3-IT</cdata></JobTitle> <Location><cdata>Romania,Bucharest...</cdata></Location> <CountryCode><cdata>US</cdata> </CountryCode> <JobDescription><cdata>class technology to develop.</cdata></JobDescription> </Posting> </Postings> </jobpostings> I have the data inside the tags as strings in variables like this: jobtitle = "Programmer Analyst 3-IT" and so on... I am currently using Nokogiri to generate XML but I

Installing nokogiri - Failed to build gem native extension

阅读更多关于 Installing nokogiri - Failed to build gem native extension

While installing Nokogiri on Ubuntu 12, I got an error: Installing nokogiri (1.4.4) with native extensions Gem::Installer::ExtensionBuildError: ERROR: Failed to build gem native extension. /usr/bin/ruby1.9.1 extconf.rb extconf.rb:10: Use RbConfig instead of obsolete and deprecated Config. checking for libxml/parser.h... yes checking for libxslt/xslt.h... yes checking for libexslt/exslt.h... yes checking for iconv_open() in iconv.h... yes checking for xmlParseDoc() in -lxml2... yes checking for xsltParseStylesheetDoc() in -lxslt... yes checking for exsltFuncRegister() in -lexslt... yes checking

Getting all links of a webpage using Ruby

阅读更多关于 Getting all links of a webpage using Ruby

问题 I'm trying to retrieve every external link of a webpage using Ruby. I'm using String.scan with this regex: /href="https?:[^"]*|href='https?:[^']*/i Then, I can use gsub to remove the href part: str.gsub(/href=['"]/) This works fine, but I'm not sure if it's efficient in terms of performance. Is this OK to use or I should work with a more specific parser (nokogiri, for example)? Which way is better? Thanks! 回答1: why you dont use groups in your pattern? e.g. /http[s]?:\/\/(.+)/i so the first

How to search by attribute value

阅读更多关于 How to search by attribute value

问题 I have the following XML doc: <files> <elements xsi:type="foo:elementType1"> <name>foo1</name> </elements> <elements xsi:type="foo:elementType1"> <name>foo2</name> <other> <elements> <data1>data1</data1> <data2>data2</data2> </elements> </other> </elements> <elements> <name>foo3</name> <affiliates> <elements xsi:type="foo:elementType1"> <name>foo4</name> </elements> </affiliates> </elements> </files> I need to grab only the elements which have type = "foo:elementType1" . I tried this, but I'm

How to get text after or before certain tags using Nokogiri

阅读更多关于 How to get text after or before certain tags using Nokogiri

I have an HTML document, something like this: <root><template>title</template> <h level="3" i="3">Something</h> <template element="1"><title>test</title></template> # one # two # three # four <h level="4" i="5">something1</h> some random test <template element="1"><title>test</title></template> # first # second # third # fourth <template element="2"><title>testing</title></template> I want to extract: # one # two # three # four # first # second # third # fourth </root> In other words, I want "all text after <template element="1"><title>test</title></template> and before the next tag that

How do I scrape data from a page that loads specific data after the main page load?

阅读更多关于 How do I scrape data from a page that loads specific data after the main page load?

I have been using Ruby and Nokogiri to pull data from a URL similar to this one from the hollister website: http://www.hollisterco.com/webapp/wcs/stores/servlet/TrackDetail?storeId=10251&catalogId=10201&langId=-1&URL=TrackDetailView&orderNumber=1316358 My script looks like this right now: require 'rubygems' require 'nokogiri' require 'open-uri' page = Nokogiri::HTML(open("http://www.hollisterco.com/webapp/wcs/stores/servlet/TrackDetail?storeId=10251&catalogId=10201&langId=-1&URL=TrackDetailView&orderNumber=1316358")) puts page.css("h3[data-property=GLB_ORDERNUMBERSYMBOL]")[0].text My problem

possible to load nokogiri in jruby without installing nokogiri-java?

阅读更多关于 possible to load nokogiri in jruby without installing nokogiri-java?

i need a way to run following nokogiri script #parser.rb require 'nokogiri' def parseit() //... end and call the parseit() while running below main.rb in jruby #main.rb require 'parser' parseit() Of course the problem is jruby cannot find 'nokogiri' as I have not installed it aka nokogiri-java via jruby -S gem install nokogiri The reason is there is some bug I found in nokogiri running under Jruby, so I have only installed nokogiri on Ruby not Jruby. The parser.rb runs perfectly under just Ruby. So my objective is to be able to run parseit() without having to install nokogiri on Jruby!

How to Get the Page Source with Mechanize/Nokogiri

阅读更多关于 How to Get the Page Source with Mechanize/Nokogiri

问题 I'm logged into a webpage/servlet using Mechanize. I have a page object jobShortListPg = agent.get(addressOfPage) When i use the following puts jobShortListPg I get the "mechanized" version of the page which I don't want e.g. #<Mechanize::Page::Link "Home" "blahICScriptProgramName=WEBLIB_MENU.ISCRIPT3.FieldFormula.IScript_DrillDown&target=main0&Level=0&RL=&navc=3171"> How do I get the html source of the page instead? 回答1: Use .body puts jobShortListPg.body 回答2: Use the content method of the

Following a link using Nokogiri for scraping

阅读更多关于 Following a link using Nokogiri for scraping

Is there a method to follow a link using Nokogiri for scraping? I know I can extract the href and open it, but I thought I saw a method to do this using hpricot and was wondering if there was something like that in Nokogiri. dbyrne Here is an excellent screen scraping guide for using Ruby, Nokigiri, Hpricot, and Firebug. Personally I am a big fan of using Mechanize , which is a headless browser, for screen scraping. You can use mechanize to navigate links and fill out forms and it will handle all the tricky stuff like cookies. 来源： https://stackoverflow.com/questions/2807500/following-a-link

Ruby Mechanize, Nokogiri and Net::HTTP

阅读更多关于 Ruby Mechanize, Nokogiri and Net::HTTP

I am using Net::HTTP for HTTP requests and getting a response back: uri = URI("http://www.example.com") http = Net::HTTP.start(uri.host, uri.port, proxy_host, proxy_port) request = Net::HTTP::Get.new uri.request_uri response = http.request request # Net::HTTPResponse object body = response.body If I have to use the Nokogiri gem in order to parse this HTML response I will do: nokogiri_obj = Nokogiri::HTML(body) But if I want to use Mechanize gem I need to do this: agent = Mechanize.new mechanize_obj = agent.get("http://www.example.com") Is it possible for me to use Net::Http for getting the