hpricot

NoClassDefFoundError on org.jruby.Main

左心房为你撑大大i 提交于 2020-01-06 03:01:06
问题 I'm trying to install the hpricot gem on my Windows machine using JRuby 1.4.0RC1. I'm trying to follow the advice to the related question (see -> Installing hpricot for JRuby). Per the answer's advice I pulled the git head of hpricot and from it's dir ran: jruby -S rake package_jruby cd pkg sudo jgem install ./hpricot-0.8.1-jruby.gem But when I run this I get the following NoClassDefFoundError: Exception in thread "main" java.lang.NoClassDefFoundError: org/jruby/Main Caused by: java.lang

How do you know when to use an XML parser and when to use ActiveResource?

流过昼夜 提交于 2020-01-05 03:20:28
问题 I tried using ActiveResource to parse a web service that was more like a HTML document and I kept getting a 404 error. Do I need to use an XML parser for this task instead of ActiveResource? My guess is that ActiveResource is only useful if you are consuming data from another Rails app and the XML data is easily translatable to a Rails model. For example, if the web service is more wide-ranging XML like a HTML document or an RSS feed, you want to use a parser like hpricot or nokogiri. Is this

Convert HTML to plain text and maintain structure/formatting, with ruby

放肆的年华 提交于 2020-01-02 04:36:05
问题 I'd like to convert html to plain text. I don't want to just strip the tags though, I'd like to intelligently retain as much formatting as possible. Inserting line breaks for <br> tags, detecting paragraphs and formatting them as such, etc. The input is pretty simple, usually well-formatted html (not entire documents, just a bunch of content, usually with no anchors or images). I could put together a couple regexs that get me 80% there but figured there might be some existing solutions with

How do I do a regex search in Nokogiri for text that matches a certain beginning?

假装没事ソ 提交于 2019-12-31 11:43:29
问题 Given: require 'rubygems' require 'nokogiri' value = Nokogiri::HTML.parse(<<-HTML_END) "<html> <body> <p id='para-1'>A</p> <div class='block' id='X1'> <h1>Foo</h1> <p id='para-2'>B</p> </div> <p id='para-3'>C</p> <h2>Bar</h2> <p id='para-4'>D</p> <p id='para-5'>E</p> <div class='block' id='X2'> <p id='para-6'>F</p> </div> </body> </html>" HTML_END I want to do something like what I can do in Hpricot: divs = value.search('//div[@id^="para-"]') How do I do a pattern search for elements in XPath

Nokogiri vs Hpricot?

断了今生、忘了曾经 提交于 2019-12-31 08:58:05
问题 Which one would you choose? My important attributes are (not in order): Support and future enhancements. Community and general knowledge base (on the Internet). Comprehensive (I.E., proven to parse a wide range of *.*ml pages). Performance. Memory footprint (runtime, not the code-base). 回答1: Pick Nokogiri, for all points and especially point one: Hpricot is no longer maintained. Meta answer: See ruby-toolbox to get an idea of the popularity of different tools in a given area. 回答2: Only pick

Screen scraping through nokogiri or hpricot

可紊 提交于 2019-12-25 18:11:10
问题 I'm trying to get actual value of given xpath. I am having the following code in sample.rb file require 'rubygems' require 'nokogiri' require 'open-uri' doc = Nokogiri::HTML(open('http://www.changebadtogood.com/')) desc "Trying to get the value of given xapth" task :sample do begin doc.xpath('//*[@id="view_more"]').each do |link| puts link.content end rescue Exception => e puts "error" end end Output is: View more issues .. When I try to get the value for other a different XPath, such as:

How to pull data from KML/XML?

拈花ヽ惹草 提交于 2019-12-24 11:52:37
问题 I have some data I converted to XML from a KML file and I was curious how to use PHP or Ruby to get back things like the neighborhood names and coordinates. I know when they have a tag around them like so. <cities> <neighborhood>Gotham</neighborhood> </cities> but the data is unfortunately formatted as: <SimpleData name="neighborhd">Colgate Center</SimpleData> instead of <neighborhd>Colgate Center</neighborhd> This is the KML source: How can I use PHP or Ruby to pull data from something like

open-uri is not redirecing http to https

为君一笑 提交于 2019-12-19 13:12:07
问题 I am using Hpricot and OpenURI to parse webpages and extract URLs from them. When I get a link like "http:rapidshare.com", it is not redirecting to https. This is the error I got: /home/leonidus/.rvm/rubies/ruby-1.9.3-p125/lib/ruby/1.9.1/open-uri.rb:216:in `open_loop': redirection forbidden: http:.................=> https:......................... . . I tried to use the exception handler OPENURI::HTTPREDIRECT but then again I am getting the same error. I tried all the blogs but it is not

open-uri is not redirecing http to https

时光毁灭记忆、已成空白 提交于 2019-12-19 13:12:03
问题 I am using Hpricot and OpenURI to parse webpages and extract URLs from them. When I get a link like "http:rapidshare.com", it is not redirecting to https. This is the error I got: /home/leonidus/.rvm/rubies/ruby-1.9.3-p125/lib/ruby/1.9.1/open-uri.rb:216:in `open_loop': redirection forbidden: http:.................=> https:......................... . . I tried to use the exception handler OPENURI::HTTPREDIRECT but then again I am getting the same error. I tried all the blogs but it is not

Removing anything between XML tags and their content

僤鯓⒐⒋嵵緔 提交于 2019-12-19 04:21:43
问题 I would need to remove anything between XML tags, especially whitespace and newlines. For example removing whitespace and newslines from: </node> \n<node id="whatever"> to get: </node><node id="whatever"> This is not meant for parsing XML by hand , but rather to prepare XML data before it's getting parsed by a tool. To be more specific, I'm using Hpricot (Ruby) to parse XML and unfortunately we're currently stuck on version 0.6.164, so ... I don't know about more recent versions, but this one