How to parse word documents with ruby?

邮差的信 提交于 2019-12-21 07:13:50

问题


Does anyone know of a library that I can use on OS X/Linux to parse Word files and output the content as HTML?

I've had a look at win32ole but as far as I can see it's for Windows only, although I could be wrong.

Any suggestions?


回答1:


The Word document format (ignoring docx for the moment) is terrible and was constantly changing. IMHO that is why there are so few (read: zero) Ruby libraries out there to parse them.

What I recommend doing is using JRuby and some of the established Java libraries for reading the doc format. Google should help you out there: http://schmidt.devlib.org/java/libraries-word.html.

There is a Java project for reading MIcrosoft file formats, POI (http://poi.apache.org/) and they do have Ruby bindings (http://poi.apache.org/poi-ruby.html) but I'm not sure how up-to-date those are. On their site it says the Ruby bindings are for 1.8.2...



来源:https://stackoverflow.com/questions/375861/how-to-parse-word-documents-with-ruby

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!