How to make modifications to SOLR's tfidf similarity?

≯℡__Kan透↙ 提交于 2021-02-11 13:30:44

问题


I am trying to search for titles hence just the presence of the word is sufficient and its frequency is not relevant at least to my use-case.

For e.g: the search query is: "board early with my pets"

The results I got are: Result 1: Pets 2.3924026

Result 2: Pets Counts against in cabin pet limit 2.0538325

Result 3: Pets Preboarding allowed 1.6092906

Ideally I want the result 3 to come at the top which needs some external work. However the result 1 is obvious and acceptable but the result 2 has the score of 2.05 as it has 'pet' is mentioned twice, implies the tf value is higher [2/4(after removing stop words)]. My requirement is just detect the presence of the word and not to go for word's frequency.

How to achieve this ?


回答1:


If you don't need phrase search or other functionality that depend on position data being indexed, you can use omitTermFreqAndPositions="true" for the field in question. In that case no position or frequency will be stored for the terms.

If that's not an option, you can create a dummy similarity class that extends DefaultSimilarity and returns 1.0f for tf. Such an example can be found in Solr Custom Similarity.

You can also configure different similarity classes for each field, allowing you to drop tf scoring for a single field.

A third option is to use the constant scoring operator for the part of your query that you want to have constant score. Not sure if the edismax parser supports this.



来源:https://stackoverflow.com/questions/51264962/how-to-make-modifications-to-solrs-tfidf-similarity

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!