Getting PHP to read .doc files on Linux

后端 未结 8 1363
没有蜡笔的小新
没有蜡笔的小新 2020-12-18 05:41

I\'m trying to read a .doc file into a database so that I can index it\'s contents. Is there an easy way for PHP on Linux to read .doc files? Failing that is it possible to

相关标签:
8条回答
  • 2020-12-18 06:33

    Conor, I'd suggest to look at OpenOffice command line interface / calling macros. It can convert many file formats to many others. Then you can pick something much more parse-able than MS doc.

    For instance, to convert to PDF, a command line is:

    /usr/lib/ooo-2.0/program/soffice.bin -norestore -nofirststart -nologo -headless -invisible   "macro:///Standard.Module1.SaveAsPDF(demo.doc)"
    
    0 讨论(0)
  • 2020-12-18 06:34

    It's not PHP, but there is a doc2rtf utility out there that you can use. From there you can just open the RTF file as a text document, write some string replacement routines to remove the RTF formatting codes, and have a glob of text suitable for indexing.

    Alternately, you can get OpenOffice and open the MS Word documents and just File > Save As > RTF.

    0 讨论(0)
提交回复
热议问题