Need help to decide between the type of spellchecker to use in solr?

早过忘川 提交于 2020-01-03 03:21:26

问题


I have a list of cities on mysql db which is hooked onto a UI for autocompletion purposes. I am currently using solr-5.3.0. Data import is happening through scheduled delta imports. I have the following questions:

  1. I want to implement spell checker to this feature. I tried using:

    1. DirectSolrSpellChecker
    2. IndexBasedSpellChecker
    3. FileBasedSpellChecker


    Out of these 3 only FileBasedSpellChecker is able to give suggestions that solely exists on db. For eg, while searching cologne I've got results like

        {
      "responseHeader":{
        "status":0,
        "QTime":4,
        "params":{
          "q":"searchfield:kolakata",
          "indent":"true",
          "spellcheck":"true",
          "wt":"json"}},
      "response":{"numFound":0,"start":0,"docs":[]
      },
      "spellcheck":{
    "suggestions":[
      "cologne",{
        "numFound":4,
        "startOffset":12,
        "endOffset":19,
        "suggestion":["Cologne",
          "Bologna",
          "Cogne",
          "Bastogne"]}],
    "collations":[
      "collation","searchfield:Cologne"]}}
    

    These cities are pretty accurate and exists in db/file.

    But when I use other 2 I got results like

      {
      "responseHeader":{
        "status":0,
        "QTime":4,
        "params":{
          "q":"searchfield:kolakata",
          "indent":"true",
          "spellcheck":"true",
          "wt":"json"}},
      "response":{"numFound":0,"start":0,"docs":[]
      },
      "spellcheck":{
    "suggestions":[
      "cologne",{
        "numFound":4,
        "startOffset":12,
        "endOffset":19,
        "suggestion":["Cologne",
          "Cologn",
          "Colognei"]}],
    "collations":[
      "collation","searchfield:Cologne"]}}
    

    These cities who are not present in my db.

    Though FileBasedSpellChecker is giving satisfactory results, but I am a little apprehensive in using them because, I would need to keep updating the file manually everytime a new city gets added/removed. Also its generally not advisable to use FileBasedSpellChecker in general.

  2. I also need to make the suggestions searchable as well, that means currently I am accessing the doc returned in

    "responseHeader":{"response":{"docs":[<some-format>]}} 
    

    to search for results in that city, but now I want the suggestor to return the results in the same <some-format> instead of just string results, in order to get it integrated with UI properly.

  3. One minor change requested is to sort the suggestions in ascending order of edit/levenshtein distance. This is not a hard requirement and can be negotiated with.

edit My solrconfig looks like this:

<requestHandler name="/select" class="solr.SearchHandler">
     <lst name="defaults">
       <str name="echoParams">explicit</str>
       <int name="rows">10</int>
       <str name="df">searchfield</str>
       <str name="spellcheck">true</str>
       <str name="spellcheck.collate">true</str>
       <str name="spellcheck.dictionary">file</str>
       <str name="spellcheck.maxCollationTries">5</str>
       <str name="spellcheck.count">5</str>
     </lst>
     <arr name="last-components">
        <str>spellcheck</str>
     </arr>
</requestHandler>

and

  <searchComponent name="spellcheck" class="solr.SpellCheckComponent">
        <str name="queryAnalyzerFieldType">text_ngram</str>
        <lst name="spellchecker">
                <str name="name">file</str>
                <str name="classname">solr.FileBasedSpellChecker</str>
                <str name="sourceLocation">spellings.txt</str>
                <str name="spellcheckIndexDir">./spellchecker</str>
        </lst>
  </searchComponent>

schema looks like this:

 <field name="name" type="string" indexed="true" stored="true" multiValued="false" />
    <field name="latlng" type="location" indexed="true" stored="true" multiValued="false" />
    <field name="citycode" type="string" indexed="true" stored="true" multiValued="false" />
    <field name="country" type="string" indexed="true" stored="true" multiValued="false" />
    <field name="searchscore" type="float" indexed="true" stored="true" multiValued="false" />
    <field name="searchfield" type="text_ngram" indexed="true" stored="false" multiValued="true" omitNorms="true"  omitTermFreqAndPositions="true" />
<defaultSearchFieldsearchfield</defaultSearchField>
        <solrQueryParser defaultOperator="OR"/>
        <copyField source="name" dest="searchfield"/>

来源:https://stackoverflow.com/questions/40743684/need-help-to-decide-between-the-type-of-spellchecker-to-use-in-solr

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!