Read MS Word .doc file with ruby and win32ole

风流意气都作罢 提交于 2019-12-11 05:27:21

问题


I'm trying to read .doc file with ruby, I use win32ole library.

IT my code:


require 'win32ole'

class DocParser

  def initialize
    @content = ''
  end

  def read_file file_path
    begin
      word = WIN32OLE.connect( 'Word.Application' )
      doc  = word.activedocument
    rescue
      word = WIN32OLE.new( 'Word.Application' )
      doc  = word.documents.open( file_path )
    end
    word.visible = false
    doc.sentences.each{ |x| @content = @content + x.text }

    word.quit
    @content
  end
end

I kick off doc reading with DocParser.new.read_file('path/file.doc')

When I run this using rails c - I don't have any problems, it's working fine. But when I run it using rails (e.g. after button click), once in a while (every 3-4 time) this code crashes with error:


WIN32OLERuntimeError (failed to create WIN32OLE object from `Word.Application'
    HRESULT error code:0x800401f0
      CoInitialize has not been called.):
  lib/file_parsers/doc_parser.rb:14:in `initialize'
  lib/file_parsers/doc_parser.rb:14:in `new'
  lib/file_parsers/doc_parser.rb:14:in `rescue in read_file'
  lib/file_parsers/doc_parser.rb:10:in `read_file'
  lib/search_engine.rb:10:in `block in search'
  lib/search_engine.rb:43:in `block in each_file_in'
  lib/search_engine.rb:42:in `each_file_in'
  lib/search_engine.rb:8:in `search'
  app/controllers/home_controller.rb:9:in `search'


  Rendered c:/Ruby193/lib/ruby/gems/1.9.1/gems/actionpack-4.1.1/lib/action_dispatch/middleware/templates/rescues/_source.erb (0.0ms)
  Rendered c:/Ruby193/lib/ruby/gems/1.9.1/gems/actionpack-4.1.1/lib/action_dispatch/middleware/templates/rescues/_trace.text.erb (2.0ms)
  Rendered c:/Ruby193/lib/ruby/gems/1.9.1/gems/actionpack-4.1.1/lib/action_dispatch/middleware/templates/rescues/_request_and_response.text.erb (2.0ms)
  Rendered c:/Ruby193/lib/ruby/gems/1.9.1/gems/actionpack-4.1.1/lib/action_dispatch/middleware/templates/rescues/diagnostics.erb (56.0ms)

Aditionaly, this code read doc file successfully, but RAILS CRASHES AFTER A FEW SECONDS: look at this gist

What is my problem? How can I fix it? Please, help!


回答1:


Don't know the difference between rails c and rails, so I'll give some random advise.

First, it is a bad idea to run this in a webserver, each time Word is run on the server, so what happens if multiple users start using this at the same time ?

You'd better convert your .doc files to another format first like .rtf or .docx (a batch conversion ?) and then use other gems that don't require Word itself.

If you keep it like this, consider to not close word (remove the word.quit) buit only close the document itself, the instance will be picked up the next time by the WIN32OLE.connect

While testing you'de better keep word visible so that you can better see what is happening (errors ?). I notice your path uses forward slashes while in this case backslashes are needed but since your code runs a few times before the error i suppose that is not the problem.

Hope this helps.




回答2:


I upgrade my ruby from 1.9.3 to 2.0.0.

Now rails doesn't crashes and I have not problems with win23ole and reading old version MS Word documents.

I guess the problem was in memory usage - cause new ruby (>2.0.0) use new Garbage Collector.



来源:https://stackoverflow.com/questions/24033633/read-ms-word-doc-file-with-ruby-and-win32ole

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!