Getting PHP to read .doc files on Linux

后端 未结 8 1362
没有蜡笔的小新
没有蜡笔的小新 2020-12-18 05:41

I\'m trying to read a .doc file into a database so that I can index it\'s contents. Is there an easy way for PHP on Linux to read .doc files? Failing that is it possible to

相关标签:
8条回答
  • 2020-12-18 06:17

    There seems to be a library for accessing Word documents but not sure how to access it from PHP. I think the best solution would be to call their wv command from PHP.

    0 讨论(0)
  • 2020-12-18 06:17

    After days of searching, here is my best solution : http://wvware.sourceforge.net/

    Install package

    sudo apt-get install wv
    

    Use it in PHP :

    $output = str_replace('.doc', '.txt', $filename);
    shell_exec('/usr/bin/wvText ' . $filename . ' ' . $output);
    $text = file_get_contents($output);
    # Convert to UTF-8 if needed
    if(!mb_detect_encoding($text, 'UTF-8', true))
    {
        $text = utf8_encode($text);
    }
    unlink($output);
    
    0 讨论(0)
  • 2020-12-18 06:19

    I found a unoconv package in Ubuntu. It does conversion between all formats supported by OpenOffice. You should be able to use exec in php to run this utility.

    0 讨论(0)
  • 2020-12-18 06:20

    phpLiveDocx is a Zend Framework component and can read and write DOC and RTF files in PHP on Linux, Windows and Mac. Furthermore, you can use it to generate PDF files and even merge data from PHP into template files created with MS Word or Open Office!

    See the project web site at:

    http://www.phplivedocx.org

    0 讨论(0)
  • 2020-12-18 06:27

    You can use antiword or AbiWord to pull the text out and feed it to your favorite full-text indexer. AbiWord is probably more effective for your purposes because it can convert into RTF, PDF and other formats (yes, it's a GUI word processor, but it also supports command-line usage).

    0 讨论(0)
  • 2020-12-18 06:29

    DOC files are stored in binary format which there hasn't been any purely php written classes in dealing with them.

    RTF files are much easier to parse, being mostly text you can just open them up with fopen and read the contents.

    I would suggest using RTF if you can, as there really is not a sound solution for DOC files yet.

    0 讨论(0)
提交回复
热议问题