nokogiri

Generating XML with cdata using Ox?

回眸只為那壹抹淺笑 提交于 2019-12-04 19:29:43
I need to generate XML using ox but didn't get much help from the documentation. I need to generate XML like this: <Jobpostings> <Postings> <Posting> <JobTitle><cdata>Programmer Analyst 3-IT</cdata></JobTitle> <Location><cdata>Romania,Bucharest...</cdata></Location> <CountryCode><cdata>US</cdata> </CountryCode> <JobDescription><cdata>class technology to develop.</cdata></JobDescription> </Posting> </Postings> </jobpostings> I have the data inside the tags as strings in variables like this: jobtitle = "Programmer Analyst 3-IT" and so on... I am currently using Nokogiri to generate XML but I

Installing nokogiri - Failed to build gem native extension

左心房为你撑大大i 提交于 2019-12-04 19:12:53
While installing Nokogiri on Ubuntu 12, I got an error: Installing nokogiri (1.4.4) with native extensions Gem::Installer::ExtensionBuildError: ERROR: Failed to build gem native extension. /usr/bin/ruby1.9.1 extconf.rb extconf.rb:10: Use RbConfig instead of obsolete and deprecated Config. checking for libxml/parser.h... yes checking for libxslt/xslt.h... yes checking for libexslt/exslt.h... yes checking for iconv_open() in iconv.h... yes checking for xmlParseDoc() in -lxml2... yes checking for xsltParseStylesheetDoc() in -lxslt... yes checking for exsltFuncRegister() in -lexslt... yes checking

Getting all links of a webpage using Ruby

白昼怎懂夜的黑 提交于 2019-12-04 19:11:34
问题 I'm trying to retrieve every external link of a webpage using Ruby. I'm using String.scan with this regex: /href="https?:[^"]*|href='https?:[^']*/i Then, I can use gsub to remove the href part: str.gsub(/href=['"]/) This works fine, but I'm not sure if it's efficient in terms of performance. Is this OK to use or I should work with a more specific parser (nokogiri, for example)? Which way is better? Thanks! 回答1: why you dont use groups in your pattern? e.g. /http[s]?:\/\/(.+)/i so the first

How to search by attribute value

蹲街弑〆低调 提交于 2019-12-04 18:34:31
问题 I have the following XML doc: <files> <elements xsi:type="foo:elementType1"> <name>foo1</name> </elements> <elements xsi:type="foo:elementType1"> <name>foo2</name> <other> <elements> <data1>data1</data1> <data2>data2</data2> </elements> </other> </elements> <elements> <name>foo3</name> <affiliates> <elements xsi:type="foo:elementType1"> <name>foo4</name> </elements> </affiliates> </elements> </files> I need to grab only the elements which have type = "foo:elementType1" . I tried this, but I'm

How to get text after or before certain tags using Nokogiri

青春壹個敷衍的年華 提交于 2019-12-04 18:06:16
I have an HTML document, something like this: <root><template>title</template> <h level="3" i="3">Something</h> <template element="1"><title>test</title></template> # one # two # three # four <h level="4" i="5">something1</h> some random test <template element="1"><title>test</title></template> # first # second # third # fourth <template element="2"><title>testing</title></template> I want to extract: # one # two # three # four # first # second # third # fourth </root> In other words, I want "all text after <template element="1"><title>test</title></template> and before the next tag that

How do I scrape data from a page that loads specific data after the main page load?

只谈情不闲聊 提交于 2019-12-04 17:23:35
I have been using Ruby and Nokogiri to pull data from a URL similar to this one from the hollister website: http://www.hollisterco.com/webapp/wcs/stores/servlet/TrackDetail?storeId=10251&catalogId=10201&langId=-1&URL=TrackDetailView&orderNumber=1316358 My script looks like this right now: require 'rubygems' require 'nokogiri' require 'open-uri' page = Nokogiri::HTML(open("http://www.hollisterco.com/webapp/wcs/stores/servlet/TrackDetail?storeId=10251&catalogId=10201&langId=-1&URL=TrackDetailView&orderNumber=1316358")) puts page.css("h3[data-property=GLB_ORDERNUMBERSYMBOL]")[0].text My problem

possible to load nokogiri in jruby without installing nokogiri-java?

我是研究僧i 提交于 2019-12-04 16:56:16
i need a way to run following nokogiri script #parser.rb require 'nokogiri' def parseit() //... end and call the parseit() while running below main.rb in jruby #main.rb require 'parser' parseit() Of course the problem is jruby cannot find 'nokogiri' as I have not installed it aka nokogiri-java via jruby -S gem install nokogiri The reason is there is some bug I found in nokogiri running under Jruby, so I have only installed nokogiri on Ruby not Jruby. The parser.rb runs perfectly under just Ruby. So my objective is to be able to run parseit() without having to install nokogiri on Jruby!

How to Get the Page Source with Mechanize/Nokogiri

心不动则不痛 提交于 2019-12-04 16:30:57
问题 I'm logged into a webpage/servlet using Mechanize. I have a page object jobShortListPg = agent.get(addressOfPage) When i use the following puts jobShortListPg I get the "mechanized" version of the page which I don't want e.g. #<Mechanize::Page::Link "Home" "blahICScriptProgramName=WEBLIB_MENU.ISCRIPT3.FieldFormula.IScript_DrillDown&target=main0&Level=0&RL=&navc=3171"> How do I get the html source of the page instead? 回答1: Use .body puts jobShortListPg.body 回答2: Use the content method of the

Following a link using Nokogiri for scraping

人盡茶涼 提交于 2019-12-04 16:09:15
Is there a method to follow a link using Nokogiri for scraping? I know I can extract the href and open it, but I thought I saw a method to do this using hpricot and was wondering if there was something like that in Nokogiri. dbyrne Here is an excellent screen scraping guide for using Ruby, Nokigiri, Hpricot, and Firebug. Personally I am a big fan of using Mechanize , which is a headless browser, for screen scraping. You can use mechanize to navigate links and fill out forms and it will handle all the tricky stuff like cookies. 来源: https://stackoverflow.com/questions/2807500/following-a-link

Ruby Mechanize, Nokogiri and Net::HTTP

南笙酒味 提交于 2019-12-04 14:58:02
I am using Net::HTTP for HTTP requests and getting a response back: uri = URI("http://www.example.com") http = Net::HTTP.start(uri.host, uri.port, proxy_host, proxy_port) request = Net::HTTP::Get.new uri.request_uri response = http.request request # Net::HTTPResponse object body = response.body If I have to use the Nokogiri gem in order to parse this HTML response I will do: nokogiri_obj = Nokogiri::HTML(body) But if I want to use Mechanize gem I need to do this: agent = Mechanize.new mechanize_obj = agent.get("http://www.example.com") Is it possible for me to use Net::Http for getting the