understanding the relationship between boosting a document in lucene at index time and its corresponding score at search time

末鹿安然 提交于 2019-12-03 22:08:08

问题


When indexing, I boost certain documents, but they do not appear on the top of the list of retrieved documents. I looked at the score of those documents, and somehow, the score of the documents retrieved is always NaN.

What is the relationship between a boost of a document at index time and its score at retrieve time? I thought these would be correlated, and further, I thought I would get a wide range of scores in my scoredocs, not just NaN. If you can shed some light on this I would be grateful.

I have read http://lucene.apache.org/java/2_3_2/api/org/apache/lucene/search/Similarity.html

and cant figure out what is missing.

Here is the simple boosting code:

if (myCondition)  
{
   myDocument.SetBoost(1.1f);
}
myIndexWriter.AddDocument(document);

回答1:


I'm gonna go on a wild guess here since you havent provide a sample of you search code, but a common reason why the score of retreived docs is NaN is because you use a Sort. When sorting, most of the time the score of the documents is not used, and therefore disabled by default.

If you use a Sort for your search, and want the score, check the method setDefaultFieldSortScoring of the IndexSearcher class. This method allows you to enable scoring the documents in a search that uses a Sort.

http://lucene.apache.org/java/2_9_4/api/all/org/apache/lucene/search/IndexSearcher.html#setDefaultFieldSortScoring(boolean, boolean)



来源:https://stackoverflow.com/questions/7771830/understanding-the-relationship-between-boosting-a-document-in-lucene-at-index-ti

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!