Indexing a comma-separated value field in Elastic Search

前端 未结 1 1840
暗喜
暗喜 2020-12-30 10:40

I\'m using Nutch to crawl a site and index it into Elastic search. My site has meta-tags, some of them containing comma-separated list of IDs (that I intend to use for searc

1条回答
  •  感情败类
    2020-12-30 11:28

    Create custom analyzer which will split indexed text into tokens by commas.

    Then you can try to search. In case you don't care about relevance you can use filter to search through your documents. My example shows how you can attempt search with term filter.

    Below you can find how to do this with sense plugin.

    DELETE testindex
    
    PUT testindex
    {
        "index" : {
            "analysis" : {
                "tokenizer" : {
                    "comma" : {
                        "type" : "pattern",
                        "pattern" : ","
                    }
                },
                "analyzer" : {
                    "comma" : {
                        "type" : "custom",
                        "tokenizer" : "comma"
                    }
                }
            }
        }
    }
    
    PUT /testindex/_mapping/yourtype
    {
            "properties" : {
                "contentType" : {
                    "type" : "string",
                    "analyzer" : "comma"
                }
            }
    }
    
    PUT /testindex/yourtype/1
    {
        "contentType" : "1,2,3"
    }
    
    PUT /testindex/yourtype/2
    {
        "contentType" : "3,4"
    }
    
    PUT /testindex/yourtype/3
    {
        "contentType" : "1,6"
    }
    
    GET /testindex/_search
    {
        "query": {"match_all": {}}
    }
    
    GET /testindex/_search
    {
        "filter": {
            "term": {
               "contentType": "6"
            }
        }
    }
    

    Hope it helps.

    0 讨论(0)
提交回复
热议问题