Is there a reasonable way to extract plain text from a Word file that doesn\'t depend on COM automation? (This is a a feature for a web app deployed on a non-Windows platfo
This worked well for .doc and .odt.
It calls openoffice on the command line to convert your file to text, which you can then simply load into python.
(It seems to have other format options, though they are not apparenlty documented.)