Different lucene search results using different search space size

拟墨画扇 提交于 2019-12-23 23:22:03

问题


I have an application that uses lucene for searching. The search space are in the thousands. Searching against these thousands, I get only a few results, around 20 (which is ok and expected).

However, when I reduce my search space to just those 20 entries (i.e. I indexed only those 20 entries and disregard everything else...so that development would be easier), I get the same 20 results but in different order (and scoring).

I tried disabling the norm factors via Field#setOmitNorms(true), but I still get different results?

What could be causing the difference in the scoring?

Thanks


回答1:


Please see the scoring documentation in Lucene's Similarity API. My bet is on the difference in idf between the two cases (both numDocs and docFreq are different). In order to know for sure, use the explain() function to debug the scores.

Edit: A code fragment for getting explanations:

TopDocs hits = searcher.search(query, searchFilter, max);
ScoreDoc[] scoreDocs = hits.scoreDocs;
for (ScoreDoc scoreDoc : scoreDocs) {
  String explanation = searcher.explain(query, scoreDoc.doc).toString();
  Log.debug(explanation);
}



回答2:


Scoring depends on all the documents in the index:

In general, the idea behind the Vector Space Model (VSM) is the more times a query term appears in a document relative to the number of times the term appears in all the documents in the collection, the more relevant that document is to the query.

Source: Apache Lucene - Scoring



来源:https://stackoverflow.com/questions/1742124/different-lucene-search-results-using-different-search-space-size

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!