ignore accents in elastic search with haystack

大憨熊 提交于 2020-01-14 02:34:00

问题


I am using elasticsearch along with haystack in order to provide search. I want user to search in language other than english. E.g. currently trying with Greek.

How can I ignore the accents while searching for anything. E.g. let's say if I enter Ανδρέας ( with accents), its returning results matched with it.

But when I enter Ανδρεας, its not returning any results. The search engine should bring any results that have "Ανδρέας" but also "Ανδρεας" as well (the second one is not accented).

Can someone point out how to resolve the issue?

Please let me know if I need post settings for elastic search, search_indexex, etc.

EDIT:

Here's my index settings:

ELASTICSEARCH_INDEX_SETTINGS = {
     'settings': {
         "analysis": {
             "analyzer": {
                 "myanalyzer_search": {
                     "type": "custom",
                     "tokenizer": "standard",
                     "filter": [
                         "greek_lowercase_filter",
                         "my_stop_filter",
                         "greek_stem_filter",
                         "english_stem_filter",
                         "my_edge_ngram_filter",
                         "asciifolding"
                     ]
                 },
                 "myanalyzer_index": {
                     "type": "custom",
                     "tokenizer": "edgeNGram",
                     "filter": [
                         "greek_lowercase_filter",
                         "my_stop_filter",
                         "greek_stem_filter",
                         "english_stem_filter",
                         "my_edge_ngram_filter",
                         "asciifolding"
                     ]
                 },
             },
             "tokenizer": {
                 "my_edge_ngram_tokenizer": {
                     "type": "edgeNGram",
                     "min_gram": "2",
                     "max_gram": "18",
                     "token_chars": ["letter"]
                 }
             },
             "filter": {
                 "my_edge_ngram_filter": {
                     "type": "edgeNGram",
                     "min_gram": 3,
                     "max_gram": 18
                 },
                 "greek_stem_filter": {
                     "type": "stemmer",
                     "name": "greek"
                 },
                 "greek_lowercase_filter": {
                     "type": "lowercase",
                     "language": "greek"
                 },
                 "english_stem_filter": {
                     "type": "stemmer",
                     "name": "english"
                 },
                 "my_stop_filter": {
                     "type": "stop",
                     "stopwords": ["_greek_", "_english_"]
                 }
             }
         }
     }
}

This is present into search_index.py:

class ProfileIndex(indexes.SearchIndex, indexes.Indexable):
    text = indexes.EdgeNgramField(document=True, use_template=True)
    title = indexes.CharField(model_attr='title')
    sorted_title = indexes.CharField(model_attr='title', indexed=False, stored=True)
    employment_history = indexes.EdgeNgramField(model_attr='employment_history', null=True)

    def get_model(self):
        return SellerProfile

    def index_queryset(self, using=None):
        return self.get_model().objects.all()


   .........

And here's the template:

{{ object.user.get_full_name }}
{{ object.title }}
{{ object.bio }}
{{ object.employment_history }}
{{ object.education }}

I am doing query like following:

results = SearchQuerySet().model(Profile).autocomplete(text='Ανδρεας')

and

results = SearchQuerySet().model(Profile).autocomplete(text='Ανδρέας')

Thanks.


回答1:


You need to add asciifolding token filter to you analysis/query pipeline http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-asciifolding-tokenfilter.html

That basically strips any accents from your words so you can easily find them later with/without searching with accents.



来源:https://stackoverflow.com/questions/23593770/ignore-accents-in-elastic-search-with-haystack

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!