Search for keywords in Word documents and index them

安稳与你 提交于 2019-12-24 06:35:24

问题


I'm looking for a way to search in Word documents and show a result of documents that matched the search criteria. I'll try to describe the scenario in more detail here.

On a Windows system i have a bunch of folders. Each folder has alot of Word documents. Now i need an application that can search inside a specific folder for keywords that might occure in those word documents. Something like the FULLTEXT search that MySQL has.

So if i search for the following keywords: microsoft, windows XP then i want it to list every Word document that contains one or more of those keywords.

Ofcourse, the more those keywords appear a document, the higher its rank should be in the resulting list.

Now my question is, is there such a tool out there that does exactly this? Or am i better of writing such a tool myself in C#.NET? If so, to what API's do i have to look?

PS. They are .doc and .docx files.


回答1:


Looks like you need a full-blown search engine to me, including parsing, indexing, ranking, search, etc. Probably not very pleasant to implement it yourself... You could have a look at Apache Lucene.




回答2:


There is a tool right under your nose. It's Windows Search and it has an API which should meet your needs perfectly.

You might have to install the filter packs to provide Office-specific indexing if you don't have Office installed.




回答3:


Indexing is available within Windows and can deal with Word documents :

  • http://windows.microsoft.com/en-US/windows7/Improve-Windows-searches-using-the-index-frequently-asked-questions
  • make a windows highlight search in c#

If you want to build your own index, you can use IFilters to extract text from documents : How to extract text from MS office documents in C#



来源:https://stackoverflow.com/questions/12054902/search-for-keywords-in-word-documents-and-index-them

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!