Reading/Writing MS Word files in Python

拈花ヽ惹草 提交于 2019-12-17 03:38:31

问题


Is it possible to read and write Word (2003 and 2007) files in Python without using a COM object?
I know that I can:

f = open('c:\file.doc', "w")
f.write(text)
f.close()

but Word will read it as an HTML file not a native .doc file.


回答1:


I'd look into IronPython which intrinsically has access to windows/office APIs because it runs on .NET runtime.




回答2:


See python-docx, its official documentation is available here.

This has worked very well for me.




回答3:


If you only what to read, it is simplest to use the linux soffice command to convert it to text, and then load the text into python:




回答4:


doc (Word 2003 in this case) and docx (Word 2007) are different formats, where the latter is usually just an archive of xml and image files. I would imagine that it is very possible to write to docx files by manipulating the contents of those xml files. However I don't see how you could read and write to a doc file without some type of COM component interface.



来源:https://stackoverflow.com/questions/188444/reading-writing-ms-word-files-in-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!