Solr - russian synonyms are not working

老子叫甜甜 提交于 2020-03-24 23:22:58

问题


I have solr v4.8.0 on ubuntu 12.04 LTS.

I have field in schema.xml with filter solr.SynonymFilterFactory.

    <fieldType name="text_ru" class="solr.TextField" positionIncrementGap="100">
  <analyzer>
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_ru.txt" format="snowball" />
    <filter class="solr.SnowballPorterFilterFactory" language="Russian"/>
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true" />
  </analyzer>
</fieldType>

I have next mapping

spidermen, superman, batman, бетмен, бетмэн, спайдермен, спайдермэн, супермен, супермэн, spiderman

I checked encoding of "synonyms.txt" file and it is utf-8.

The queries with english synonyms work fine. I have problem only with russian synonyms, they are not working, solr ignores them. I cannot manage the problem myself.

Added by me after 30 minutes: Somehow the words: "бетмэн", "спайдермэн" are found in search results, but "бетмен", "спайдермен" are not.


回答1:


Try swapping the order of the synonym and the porter filters. As it is, you are looking in the synonym file after you chopped off your words' endings. And probably just not matching.

The Analysis screen in the admin Web UI is a great tool to see what happens with the text as it goes through individual filters.




回答2:


I've just write a small test for this case - and I find out, that stemming is cause this issue. When, I disable it - everything works smoothly, also swapping it with synonyms help as well.

Reference to test - https://github.com/MysterionRise/information-retrieval-adventure/blob/master/lucene5/src/main/scala/org/mystic/SynonymsAndStopwords.scala



来源:https://stackoverflow.com/questions/27128070/solr-russian-synonyms-are-not-working

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!