问题
I'm looking for a way to search in Word documents and show a result of documents that matched the search criteria. I'll try to describe the scenario in more detail here.
On a Windows system i have a bunch of folders. Each folder has alot of Word documents. Now i need an application that can search inside a specific folder for keywords that might occure in those word documents. Something like the FULLTEXT search that MySQL has.
So if i search for the following keywords: microsoft, windows XP
then i want it to list every Word document that contains one or more of those keywords.
Ofcourse, the more those keywords appear a document, the higher its rank should be in the resulting list.
Now my question is, is there such a tool out there that does exactly this? Or am i better of writing such a tool myself in C#.NET? If so, to what API's do i have to look?
PS. They are .doc
and .docx
files.
回答1:
Looks like you need a full-blown search engine to me, including parsing, indexing, ranking, search, etc. Probably not very pleasant to implement it yourself... You could have a look at Apache Lucene.
回答2:
There is a tool right under your nose. It's Windows Search and it has an API which should meet your needs perfectly.
You might have to install the filter packs to provide Office-specific indexing if you don't have Office installed.
回答3:
Indexing is available within Windows and can deal with Word documents :
- http://windows.microsoft.com/en-US/windows7/Improve-Windows-searches-using-the-index-frequently-asked-questions
- make a windows highlight search in c#
If you want to build your own index, you can use IFilters to extract text from documents : How to extract text from MS office documents in C#
来源:https://stackoverflow.com/questions/12054902/search-for-keywords-in-word-documents-and-index-them