ignore accents in elastic search with haystack

问题

I am using elasticsearch along with haystack in order to provide search. I want user to search in language other than english. E.g. currently trying with Greek.

How can I ignore the accents while searching for anything. E.g. let's say if I enter Ανδρέας ( with accents), its returning results matched with it.

But when I enter Ανδρεας, its not returning any results. The search engine should bring any results that have "Ανδρέας" but also "Ανδρεας" as well (the second one is not accented).

Can someone point out how to resolve the issue?

Please let me know if I need post settings for elastic search, search_indexex, etc.

EDIT:

Here's my index settings:

ELASTICSEARCH_INDEX_SETTINGS = {
     'settings': {
         "analysis": {
             "analyzer": {
                 "myanalyzer_search": {
                     "type": "custom",
                     "tokenizer": "standard",
                     "filter": [
                         "greek_lowercase_filter",
                         "my_stop_filter",
                         "greek_stem_filter",
                         "english_stem_filter",
                         "my_edge_ngram_filter",
                         "asciifolding"
                     ]
                 },
                 "myanalyzer_index": {
                     "type": "custom",
                     "tokenizer": "edgeNGram",
                     "filter": [
                         "greek_lowercase_filter",
                         "my_stop_filter",
                         "greek_stem_filter",
                         "english_stem_filter",
                         "my_edge_ngram_filter",
                         "asciifolding"
                     ]
                 },
             },
             "tokenizer": {
                 "my_edge_ngram_tokenizer": {
                     "type": "edgeNGram",
                     "min_gram": "2",
                     "max_gram": "18",
                     "token_chars": ["letter"]
                 }
             },
             "filter": {
                 "my_edge_ngram_filter": {
                     "type": "edgeNGram",
                     "min_gram": 3,
                     "max_gram": 18
                 },
                 "greek_stem_filter": {
                     "type": "stemmer",
                     "name": "greek"
                 },
                 "greek_lowercase_filter": {
                     "type": "lowercase",
                     "language": "greek"
                 },
                 "english_stem_filter": {
                     "type": "stemmer",
                     "name": "english"
                 },
                 "my_stop_filter": {
                     "type": "stop",
                     "stopwords": ["_greek_", "_english_"]
                 }
             }
         }
     }
}

This is present into search_index.py:

class ProfileIndex(indexes.SearchIndex, indexes.Indexable):
    text = indexes.EdgeNgramField(document=True, use_template=True)
    title = indexes.CharField(model_attr='title')
    sorted_title = indexes.CharField(model_attr='title', indexed=False, stored=True)
    employment_history = indexes.EdgeNgramField(model_attr='employment_history', null=True)

    def get_model(self):
        return SellerProfile

    def index_queryset(self, using=None):
        return self.get_model().objects.all()


   .........

And here's the template:

{{ object.user.get_full_name }}
{{ object.title }}
{{ object.bio }}
{{ object.employment_history }}
{{ object.education }}

I am doing query like following:

results = SearchQuerySet().model(Profile).autocomplete(text='Ανδρεας')

and

results = SearchQuerySet().model(Profile).autocomplete(text='Ανδρέας')

Thanks.

回答1:

You need to add asciifolding token filter to you analysis/query pipeline http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-asciifolding-tokenfilter.html

That basically strips any accents from your words so you can easily find them later with/without searching with accents.

来源：https://stackoverflow.com/questions/23593770/ignore-accents-in-elastic-search-with-haystack

标签

ElasticSearch

search-engine

django-haystack

non-ascii-characters