Read word document in C#

大城市里の小女人 提交于 2019-12-09 01:19:45

问题


I want to read a word document in the server (both doc and docx). The server does not have office installed, therefore i can't use com objects and also no commercial softwares.

Is there a way that I can use office tools alone and read the word docs (2003 and 2007)


回答1:


Unfortunately there are no good free options for reading .doc and .docx files. Even commercial options are sparse at reasonable prices, but there are good extremely expensive options.

For reading .doc files the only free option I'm aware of is POI for Java which you can run in .NET using IKVM. However, Word support in an experimental branch of POI's SVN repository, so I don't know how well it works.

http://poi.apache.org/

http://www.ikvm.net/

If you just want the text out of the .doc file and don't care about formatting, you can use the IFilter Win32 interface through pinvoke.

For reading .docx files you can use Microsoft Office Open XML SDK. Don't let "SDK" fool you though, this is a very light abstraction over the dealing with the XML directly. It's almost as painful to use.

http://www.microsoft.com/downloads/en/details.aspx?FamilyId=C6E744E5-36E9-45F5-8D8C-331DF206E0D0&displaylang=en




回答2:


For .docx your free option is DocX. Very advanced and easy to use. For doc I've not seen free alternative.




回答3:


Another free option for only .docx files is OpenXML SDK.

For both .doc and .docx files you can use free version of GemBox.Document if the files have relatively smaller size, otherwise you'll need their pro version.
You can open and read any Word format with it in the same way, for example:

var docxFile = DocumentModel.Load("Sample.docx");
var docFile = DocumentModel.Load("Sample.doc");
var rtfFile = DocumentModel.Load("Sample.rtf");

var docxText = docxFile.Content.ToString();
// ...


来源:https://stackoverflow.com/questions/5130911/read-word-document-in-c-sharp

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!