File system crawler - iteration bugs

余生颓废 提交于 2019-12-11 04:16:54

问题


I'm currently building a file system crawler with the following code:

require 'find'
require 'spreadsheet'
Spreadsheet.client_encoding = 'UTF-8'

count = 0

Find.find('/Users/Anconia/crawler/') do |file|           
  if file =~ /\b.xls$/                                            # check if filename ends in desired format
    contents =  Spreadsheet.open(file).worksheets
    contents.each do |row|
      if row =~ /regex/
        puts file
        count += 1
      end
    end
  end
end

puts "#{count} files were found"

And am receiving the following output: 0 files were found

The regex is tested and correct - I currently use it in another crawler that works.

The output of row.inspect is

#<Spreadsheet::Excel::Worksheet:0x003ffa5d418538 @row_addresses= @default_format= @selected= @dimensions= @name=Sheet1 @workbook=#<Spreadsheet::Excel::Workbook:0x007ff4bb147140> @rows=[] @columns=[] @links={} @merged_cells=[] @protected=false @password_hash=0 @changes={} @offsets={} @reader=#<Spreadsheet::Excel::Reader:0x007ff4bb1f3b98> @ole=#<Ole::Storage::RangesIOMigrateable:0x007ff4bb126fa8> @offset=15341 @guts={} @rows[3]> - certainly nothing to iterate over.


回答1:


Try this:

content = Spreadsheet.open(file)
sheet = content.worksheet 0 
sheet.each do |row|
...



回答2:


As Diego mentioned, I should have been iterating over contents - really appreciate the clarification! It should also be noted that row must be converted to a string before any iteration takes place.



来源:https://stackoverflow.com/questions/14044357/file-system-crawler-iteration-bugs

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!