I would like to know how can I read the contents of a doc or docx. I\'m using a Linux VPS and PHP, but if there is a simpler solution using other language, please let me kno
I wrote a library that parses the docx, odt and rtf documents based on answers here and elsewhere.
The major improvement I have made to the .docx and .odt parsing is the that the library processes the XML that describes the document and attempts to conform it to HTML tags, i.e. em and strong tags. This means that if you're using the library for a CMS, text formatting is not lost
You can get it here