How to sort on analyzed/tokenized field in Elasticsearch?

风流意气都作罢 提交于 2019-12-06 03:44:12

问题


We're storing a title field in our index and want to use the field for two purposes:

  1. We're analyzing with an ngram filter so we can provide autocomplete and instant results
  2. We want to be able to list results using an ASC sort on the title field rather than score.

The index/filter/analyzer is defined like so:

array(
    'number_of_shards' => $this->shards,
    'number_of_replicas' => $this->replicas,
    'analysis' => array(
        'filter' => array(
            'nGram_filter' => array(
                'type' => 'nGram',
                'min_gram' => 2,
                'max_gram' => 20,
                'token_chars' => array('letter','digit','punctuation','symbol')
            )
        ),

        'analyzer' => array(
            'index_analyzer' => array(
                'type' => 'custom',
                'tokenizer' =>'whitespace',
                'char_filter' => 'html_strip',
                'filter' => array('lowercase','asciifolding','nGram_filter')
            ),
            'search_analyzer' => array(
                'type' => 'custom',
                'tokenizer' =>'whitespace',
                'char_filter' => 'html_strip',
                'filter' => array('lowercase','asciifolding')
            )
        )
    )
),

The problem we're experiencing is unpredictable results when we Sort on the title field. After doing a little searching, we found this at the end of the sort man page at ElasticSearch... (http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-sort.html#_memory_considerations)

For string based types, the field sorted on should not be analyzed / tokenized.

How can we both analyze the field and sort on it later? Do we need to store the field twice with one using not_analyzed in order to sort? Since the field _source is also storing the title value in it's original state, can that not be used to sort on?


回答1:


You can use the built in concept of Multi Field Type in Elasticsearch.

The multi_field type allows to map several core_types of the same value. This can come very handy, for example, when wanting to map a string type, once when it’s analyzed and once when it’s not_analyzed.

In the Elasticsearch Reference, please look at the String Sorting and Multi Fields guide on how to setup what you need.

Please note that Multi Field mapping configuration has changed between Elasticsearch 0.90.X and 1.X. Use the appropriate following guide based on your version:

  • 0.90 Multi Field Type
  • 1.X Multi Field Type


来源:https://stackoverflow.com/questions/23273329/how-to-sort-on-analyzed-tokenized-field-in-elasticsearch

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!