Lucene.NET - checking if document exists in index

不想你离开。 提交于 2019-12-25 02:28:31

问题


I have the following code, using Lucene.NET V4, to check if a file exists in my index.

bool exists = false;
IndexReader reader = IndexReader.Open(Lucene.Net.Store.FSDirectory.Open(lucenePath), false);
Term term = new Term("filepath", "\\myFile.PDF");
TermDocs docs = reader.TermDocs(term);
if (docs.Next())
{
   exists = true;
}

The file myFile.PDF definitely exists, but it always comes back as false. When I look at docs in debug, its Doc and Freq properties state that they "threw an exception of type 'System.NullReferenceException'.


回答1:


First of all, it's a good practice to use the same instance of the IndexReader if you're not going to consider deleted documents - it's going to perform better and it's thread-safe so you can make a static read-only field out of it (although, I can see that you're specifying false for readOnly parameter so in case this is intended, just ignore this paragraph).

As for your case, are you tokenizing filepath field values? Because if you are (e.g. by using StandardAnalyzer when indexing/searching), you will probably have problems finding values such as \myFile.PDF (with default tokenizer, the value is going to be split into myFile and PDF, not sure about the leading backslash).

Hope this helps.




回答2:


You may have analyzed the field "filepath" during indexing with an analyzer which tokenizes/changes the content. e.g. the StandardAnalyzer tokenizes, lowercases, removes stopwords if specified etc.

If you only need to query with the exact filepath like in your example use the KeywordAnalyzer during indexing for this field.

If you can't re-index at the moment you need to find out which analyzer is used during indexing and use it to create your query. You have two options:

  1. Use a query parser with the right analyzer and parse the query filepath:\\myFile.PDF. If the resultung query is a TermQuery you can use its term as you did in your example. Otherwise perform a search with the query.
  2. Use the Analyzer directly to create the terms from the TokenStream object. Again, if only one term, do it as you did, if multipe terms, create a phrase query.


来源:https://stackoverflow.com/questions/20993676/lucene-net-checking-if-document-exists-in-index

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!