Best way to extract text from a Word doc without using COM/automation?

后端 未结 10 1844
遇见更好的自我
遇见更好的自我 2020-12-07 21:29

Is there a reasonable way to extract plain text from a Word file that doesn\'t depend on COM automation? (This is a a feature for a web app deployed on a non-Windows platfo

10条回答
  •  一个人的身影
    2020-12-07 21:36

    This worked well for .doc and .odt.

    It calls openoffice on the command line to convert your file to text, which you can then simply load into python.

    (It seems to have other format options, though they are not apparenlty documented.)

提交回复
热议问题