How to make modifications to SOLR's tfidf similarity?

问题

I am trying to search for titles hence just the presence of the word is sufficient and its frequency is not relevant at least to my use-case.

For e.g: the search query is: "board early with my pets"

The results I got are: Result 1: Pets 2.3924026

Result 2: Pets Counts against in cabin pet limit 2.0538325

Result 3: Pets Preboarding allowed 1.6092906

Ideally I want the result 3 to come at the top which needs some external work. However the result 1 is obvious and acceptable but the result 2 has the score of 2.05 as it has 'pet' is mentioned twice, implies the tf value is higher [2/4(after removing stop words)]. My requirement is just detect the presence of the word and not to go for word's frequency.

How to achieve this ?

回答1:

If you don't need phrase search or other functionality that depend on position data being indexed, you can use omitTermFreqAndPositions="true" for the field in question. In that case no position or frequency will be stored for the terms.

If that's not an option, you can create a dummy similarity class that extends DefaultSimilarity and returns 1.0f for tf. Such an example can be found in Solr Custom Similarity.

You can also configure different similarity classes for each field, allowing you to drop tf scoring for a single field.

A third option is to use the constant scoring operator for the part of your query that you want to have constant score. Not sure if the edismax parser supports this.

来源：https://stackoverflow.com/questions/51264962/how-to-make-modifications-to-solrs-tfidf-similarity

标签

solr

lucene