How to index files such as .txt,.pdf,.doc etc using lucene.net?

≡放荡痞女 提交于 2019-12-08 14:14:28

问题


I am new to Lucene .net.How to index files such as .txt,.pdf,.doc etc using lucene.net?and what all files we can index using lucene.net?


回答1:


Lucene.net is agnostic to indexing particular files. You must index the files yourself.

I would use IFilters to pull out the text in a document and then use Lucene.net to create the search index.

you can search codeproject.com for multiple articles about using IFilters & lucene.net




回答2:


Before you index files you need to extract text from them in a proper way. Lucene or Lucene.net don't do that. For text extraction you can use IFilter in windows. IFilters may not be stable and you need to use COM which has threading issues. In addition, using different ifilters with different versions of documents is a real trouble.

http://www.codeproject.com/Articles/13391/Using-IFilter-in-C

www.ifilter.org

There are commercial alternatives for text extraction but they are really expensive.

http://www.isys-search.com/products/document-filters

http://www.oracle.com/us/technologies/embedded/025613.htm

Apache Tika is a good open source alternative for commercial ones. It is in Java.

http://tika.apache.org/

I strongly recommend to use Apache Solr/Lucene with a good Solr .NET client instead of Lucene.net. Solr has Tika integration built-in that will achieve what you want to do. You don't need to know Java in order to use Solr. It is a standalone web service that can run on a lightweight application server.

If you build a document search solution with Lucene.Net you will have many problems which have already been addressed in Solr.

http://www.lucidimagination.com/devzone/technical-articles/content-extraction-tika

http://wiki.apache.org/solr/ExtractingRequestHandler

There is good discussion about Lucene vs Solr here.

Search Engine - Lucene or Solr



来源:https://stackoverflow.com/questions/10855907/how-to-index-files-such-as-txt-pdf-doc-etc-using-lucene-net

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!