How to sort SOLR spellCheck suggestions NOT by frequency?

萝らか妹 提交于 2020-01-11 13:09:46

问题


If you search for ahve on my staging index you get the as the first spellcheck correction because the appears more than have in the index (I have 500 documents indexed).
If you search for ahve on my local index you get have as the first spellcheck correction because have appears more than any other word in the index. (I have 21 documents indexed).
This is a simple dumb returned from my staging index

<lst name="ahve">
<int name="numFound">5</int>
<int name="startOffset">0</int>
<int name="endOffset">4</int>
<int name="origFreq">0</int>
<arr name="suggestion">
<lst>
<str name="word">the</str>
<int name="freq">112</int>
</lst>
<lst>
<str name="word">are</str>
<int name="freq">67</int>
</lst>
<lst>
<str name="word">have</str>
<int name="freq">44</int>
</lst>
<lst>
<str name="word">acne</str>
<int name="freq">10</int>
</lst>
<lst>
<str name="word">ache</str>
<int name="freq">3</int>
</lst>
</arr>
</lst>

And adding spellcheck.onlyMorePopular=true or spellcheck.onlyMorePopular=false does NOT change anything.
Is there a way not to sort the returned suggestions by frequency of appearance?


回答1:


By default, spellcheck results are returned based on the Levenshtein string distance formula and then frequency, or the frequency and then score.

You can specify your own sorting method by writing a custom comparator that implements Comparator. Then, provide the name of that method to the field comparatorClass in your solrconfig.xml.

<lst name="spellchecker">
  <str name="name">freq</str>
  <str name="field">lowerfilt</str>
  <str name="spellcheckIndexDir">spellcheckerFreq</str>
  <!-- comparatorClass be one of:
     1. score (default)
     2. freq (Frequency first, then score)
     3. A fully qualified class name
   -->
  <str name="comparatorClass">my.custom.ComparatorClass</str>
  <str name="buildOnCommit">true</str>
</lst>

A couple more suggestions:

  • The field spellcheck.onlyMorePopular doesn't affect sort ordering. This field checks the query results for each suggestion, and displays only the suggestions with the most query results, even if the correct suggestion exists. Use with caution.

  • Make sure to remove stopwords such as 'the', 'that', etc, by passing in your data through the StopFilterFactory on both the index and query side of your requestHandler.

See: http://wiki.apache.org/solr/SpellCheckComponent for more information.



来源:https://stackoverflow.com/questions/13490270/how-to-sort-solr-spellcheck-suggestions-not-by-frequency

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!