Best way to extract text from a Word doc without using COM/automation?

后端 未结 10 1842
遇见更好的自我
遇见更好的自我 2020-12-07 21:29

Is there a reasonable way to extract plain text from a Word file that doesn\'t depend on COM automation? (This is a a feature for a web app deployed on a non-Windows platfo

10条回答
  •  旧时难觅i
    2020-12-07 21:52

    Using the OpenOffice API, and Python, and Andrew Pitonyak's excellent online macro book I managed to do this. Section 7.16.4 is the place to start.

    One other tip to make it work without needing the screen at all is to use the Hidden property:

    RO = PropertyValue('ReadOnly', 0, True, 0)
    Hidden = PropertyValue('Hidden', 0, True, 0)
    xDoc = desktop.loadComponentFromURL( docpath,"_blank", 0, (RO, Hidden,) )
    

    Otherwise the document flicks up on the screen (probably on the webserver console) when you open it.

提交回复
热议问题