How to index Word 2003, 2007 and 2010 documents using Lucene.NET

亡梦爱人 提交于 2019-12-03 03:41:04

You could you use the IFilter plugins to let you retrieve the contents of the documents and then index them. The interface is originally part of Microsoft Index Service but is generally available for indexing documents.

I looked into the technology a couple of years ago and seem to remember that either the filters for Office documents were built into Windows or could be installed separately from the complete Office package but I may be wrong here.

More about the IFilter technology at IFilter at Wikipedia and IFilter at MSDN. You will have to look into P/Invoke and might get some inspiration IFilter at pinvoke.net.

A sample in C# can be found at MSDN Code Gallery.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!