Lucene scoring: in what context is queryNorm used?

六眼飞鱼酱① 提交于 2019-12-06 03:52:22

问题


I am a little confused by the lucene scoring strategy. I know that Lucene's scoring formula is like:

score(q,d) = coord(q,d) x queryNorm(q) X SUM <t_in_q> ( tf(t_in_d) x idf(t)^2 x t.getBoost() x norm(t,d))

I understand every component in this formula except queryNorm(q). As explained by the official documentation,

queryNorm(q) is a normalizing factor used to make scores between queries comparable. This factor does not affect document ranking (since all ranked documents are multiplied by the same factor), but rather just attempts to make scores from different queries (or even different indexes) comparable.

Why do I need to compare scores between different queries? In another word, could you give an example to show in which context queryNorm(q) is useful?


回答1:


Good question, I've wondered this myself. According to this ScoresAsPercentages argument, attempting to compare different queries or indexes scores, or even scores on the same query and index at different times, is a bad idea, and I agree.

My understanding is that, while queryNorm really doesn't make them strictly comparable, it does help. They are closer to comparable with the Default queryNorm than without.

I suppose it could also enable people to write their own similarity, and use this call to create normalized, comparable scores, using algorithms that work in their particular case.

There has been some discussion on dropping it, which you might find interesting.




回答2:


I know the question is old but I had a similar problem. The reason why queryNorm was not the same on all search results is that documents can be in different shards and the queryNorm is constant only within the same shard.

From my understanding this problem can be solved in 2 ways:

  • naturally, when there is a lot of data

  • setting the number of shards to 1. Of couse this have consequences on performances.

    { "settings": { "number_of_shards" : 1 } }

See http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/relevance-is-broken.html



来源:https://stackoverflow.com/questions/16784938/lucene-scoring-in-what-context-is-querynorm-used

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!