ElasticSearch: EdgeNgrams and Numbers

前端 未结 2 1933
醉梦人生
醉梦人生 2020-12-19 08:03

Any ideas on how EdgeNgram treats numbers?

I\'m running haystack with an ElasticSearch backend. I created an indexed field of type EdgeNgram. This field will conta

2条回答
  •  情话喂你
    2020-12-19 08:32

    I found my way here trying to solve this same problem in Haystack + Elasticsearch. Following the hints from uboness and ComoWhat, I wrote an alternate Haystack engine that (I believe) makes EdgeNGram fields treat numeric strings like words. Others may benefit, so I thought I'd share it.

    from haystack.backends.elasticsearch_backend import ElasticsearchSearchEngine, ElasticsearchSearchBackend
    
    class CustomElasticsearchBackend(ElasticsearchSearchBackend):
        """
        The default ElasticsearchSearchBackend settings don't tokenize strings of digits the same way as words, so emplids
        get lost: the lowercase tokenizer is the culprit. Switching to the standard tokenizer and doing the case-
        insensitivity in the filter seems to do the job.
        """
        def __init__(self, connection_alias, **connection_options):
            # see http://stackoverflow.com/questions/13636419/elasticsearch-edgengrams-and-numbers
            self.DEFAULT_SETTINGS['settings']['analysis']['analyzer']['edgengram_analyzer']['tokenizer'] = 'standard'
            self.DEFAULT_SETTINGS['settings']['analysis']['analyzer']['edgengram_analyzer']['filter'].append('lowercase')
            super(CustomElasticsearchBackend, self).__init__(connection_alias, **connection_options)
    
    class CustomElasticsearchSearchEngine(ElasticsearchSearchEngine):
        backend = CustomElasticsearchBackend
    

提交回复
热议问题