Lucene scoring: in what context is queryNorm used?

怎甘沉沦 提交于 2019-12-04 08:21:55

Good question, I've wondered this myself. According to this ScoresAsPercentages argument, attempting to compare different queries or indexes scores, or even scores on the same query and index at different times, is a bad idea, and I agree.

My understanding is that, while queryNorm really doesn't make them strictly comparable, it does help. They are closer to comparable with the Default queryNorm than without.

I suppose it could also enable people to write their own similarity, and use this call to create normalized, comparable scores, using algorithms that work in their particular case.

There has been some discussion on dropping it, which you might find interesting.

I know the question is old but I had a similar problem. The reason why queryNorm was not the same on all search results is that documents can be in different shards and the queryNorm is constant only within the same shard.

From my understanding this problem can be solved in 2 ways:

  • naturally, when there is a lot of data

  • setting the number of shards to 1. Of couse this have consequences on performances.

    { "settings": { "number_of_shards" : 1 } }

See http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/relevance-is-broken.html

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!