Extract text from doc and docx

后端 未结 9 1330
死守一世寂寞
死守一世寂寞 2020-11-27 16:24

I would like to know how can I read the contents of a doc or docx. I\'m using a Linux VPS and PHP, but if there is a simpler solution using other language, please let me kno

9条回答
  •  心在旅途
    2020-11-27 16:41

    Parse .docx, .odt, .doc and .rtf documents

    I wrote a library that parses the docx, odt and rtf documents based on answers here and elsewhere.

    The major improvement I have made to the .docx and .odt parsing is the that the library processes the XML that describes the document and attempts to conform it to HTML tags, i.e. em and strong tags. This means that if you're using the library for a CMS, text formatting is not lost

    You can get it here

提交回复
热议问题