nokogiri

Parsing Large XML files w/ Ruby & Nokogiri

被刻印的时光 ゝ 提交于 2019-12-12 07:53:30
问题 I have a large XML file (about 10K rows) I need to parse regularly that is in this format: <summarysection> <totalcount>10000</totalcount> </summarysection> <items> <item> <cat>Category</cat> <name>Name 1</name> <value>Val 1</value> </item> ...... 10,000 more times </items> What I'd like to do is parse each of the individual nodes using nokogiri to count the amount of items in one category. Then, I'd like to subtract that number from the total_count to get an ouput that reads "Count of

How to handle 404 not found errors in Nokogiri

微笑、不失礼 提交于 2019-12-12 07:50:13
问题 I am using Nokogiri to scrape web pages. Few urls need to be guessed and returns 404 not found error when they don't exist. Is there a way to capture this exception? http://yoursite/page/38475 #=> page number 38475 doesn't exist I tried the following which didn't work. url = "http://yoursite/page/38475" doc = Nokogiri::HTML(open(url)) do begin rescue Exception => e puts "Try again later" end end 回答1: It doesn't work, because you are not rescuing part of code (it's open(url) call) that raises

How to wrap Nokogiri nodeset in ONE span

≯℡__Kan透↙ 提交于 2019-12-12 06:27:15
问题 So my goal is to wrap all paragraphs after the initial paragraph within a span. I'm trying to figure out how to wrap a nodeset within a single span and .wrap() wraps each node in its own span. As in want: <p>First</p> <p>Second</p> <p>Third</p> To become: <p>First</p> <span> <p>Second</p> <p>Third</p> </span> Any sample code to help? Thanks! 回答1: I'd do as below : require 'nokogiri' doc = Nokogiri::HTML::DocumentFragment.parse(<<-html) <p>First</p> <p>Second</p> <p>Third</p> html nodeset =

How to get XML parent attribute value

三世轮回 提交于 2019-12-12 05:16:37
问题 I have multiple statements like: <House name="test1"> <Room id="test2" name="test3" > <test name="test4" param="test5"> <blah id="test6" name="test7"> </blah> </test> </Room> </House> When the blah name is some particular value like test7 I need the corresponding Room name. How do I achieve that? 回答1: I never used Nokogiri but I tried and this seems to work: xml_doc.css('blah[name="test7"]').first.ancestors("Room").first['name'] => "test3" Just check for nil s. 2.3.1 :132 > xml_doc.css('blah

FF Xpather to Nokogiri — Can I just copy and paste?

你离开我真会死。 提交于 2019-12-12 04:39:46
问题 I was doing this manually and then I got stuck and I can't figure out why it's not working. I downloaded xpather and it is giving me: /html/body/center/table/tbody/tr[3]/td/table as the path to the item I want. I have manually confirmed that this is correct but when I paste it into my code, all it does is return nil Here is my code: a = parentdoc.at_xpath("//html/body/center/table/tbody/tr[3]/td/table[1]") puts a If I do something like this: a = parentdoc.at_xpath("//html/body/center") puts a

Nokogiri Ruby HTML Parser

半腔热情 提交于 2019-12-12 04:25:36
问题 I'm running into problems scraping across multiple pages with Nokogiri. I need to be able to narrow down the results of what I am searching for based on the qualified hrefs first. So here is a script to get all of the hrefs I'm interested in obtaining. However, I'm having trouble parsing out the titles of the article so that I can link to them. It would be great to know that I can manually inspect the elements so that I have the links I want and whenever I find a link I want I can also grab

Nokogiri (Ruby): Extract tag contents for a specific attribute inside each node

会有一股神秘感。 提交于 2019-12-12 04:25:08
问题 I have a XML with the following structure <Root> <Batch name="value"> <Document id="ID1"> <Tags> <Tag id="ID11" name="name11">Contents</Tag> <Tag id="ID12" name="name12">Contents</Tag> </Tags> </Document> <Document id="ID2"> <Tags> <Tag id="ID21" name="name21">Contents</Tag> <Tag id="ID22" name="name22">Contents</Tag> </Tags> </Document> </Batch> </Root> I want to extract the contents of specific tags for each Document node, using something like this: xml.xpath('//Document/Tags').each do

Install mechanize with Ruby 2.3 on Windows 7 got error

回眸只為那壹抹淺笑 提交于 2019-12-12 04:03:59
问题 I'm trying to install Mechanize with Ruby 2.3 on Windows 7. However I got the following error. Could anyone point me to the right direction? PS C:\DevKit> ruby --version ruby 2.3.0p0 (2015-12-25 revision 53290) [x64-mingw32] PS C:\DevKit> gem install mechanize Fetching: net-http-digest_auth-1.4.gem (100%) Successfully installed net-http-digest_auth-1.4 Fetching: net-http-persistent-2.9.4.gem (100%) Successfully installed net-http-persistent-2.9.4 Fetching: mime-types-2.99.1.gem (100%)

Using Nokogiri's CSS method to get all elements within an alt tag

扶醉桌前 提交于 2019-12-12 03:44:58
问题 I am trying to use Nokogiri's CSS method to get some names from my HTML. This is an example of the HTML: <section class="container partner-customer padding-bottom--60"> <div> <div> <a id="technologies"></a> <h4 class="center-align">The Team</h4> </div> </div> <div class="consultant list-across wrap"> <div class="engineering"> <img class="" src="https://v0001.jpg" alt="Person 1"/> <p>Person 1<br>Founder, Chairman & CTO</p> </div> <div class="engineering"> <img class="" src="https://v0002.png"

Error installing “nokogiri” in a Ruby on Rails application?

女生的网名这么多〃 提交于 2019-12-12 03:17:35
问题 I've been following along with the Lynda.com's Ruby on Rails course. I did everything just as mentioned inside the videos. I am trying to run the rails server command, which should default to WEBrick, correct? I run the command and it has an issue in the nokogiri.rb file and on line 29 where the error is happening this is what is read: require 'nokogiri/nokogiri' Which is what my command prompt is throwing up on when running the rails server command. Any idea what could be causing this? If so