Why HTML tag is searchable even if it was filtered in elastic search

匿名 (未验证) 提交于 2019-12-03 02:33:02

问题:

I am new to elasticsearch and was testing html_strip filter. Ideally I should not be able to search on HTML tags. Following is steps.

Index:

curl -XPOST 'localhost:9200/foo/test/_analyzer?tokenizer=standard&char_filters=html_strip' -d ' {     "content" : "<title>Dilip Kumar</title>" }' 

Search:

http://localhost:9200/foo/test/_search?tokenizer=standard&char_filters=html_strip&q=title 

Result:

{   "took": 3,   "timed_out": false,   "_shards": {     "total": 5,     "successful": 5,     "failed": 0   },   "hits": {     "total": 1,     "max_score": 0.2169777,     "hits": [       {         "_index": "foo",         "_type": "test",         "_id": "_analyzer",         "_score": 0.2169777,         "_source": {           "content": "<title>Dilip Kumar</title>"         }       }     ]   } } 

UPDATE As suggested; I used following mapping and repeated above steps after deleting the existing index however still I am able to search markup.

curl -XPUT "http://localhost:9200/foo " -d' {   "foo": {     "settings": {       "analysis": {         "analyzer": {           "html_analyzer": {             "type": "custom",             "tokenizer": "standard",             "filter": [               "standard",               "lowercase",               "stop",               "asciifolding"             ],             "char_filter": [               "html_strip"             ]           },           "whitespace_analyzer": {             "type": "custom",             "tokenizer": "whitespace",             "filter": [               "standard",               "lowercase",               "stop",               "asciifolding"             ]           }         }       }     },     "mappings": {       "test": {         "properties": {           "content": {             "type": "string",             "index_analyzer": "html_analyzer",             "search_analyzer": "whitespace_analyzer"           }         }       }     }   } }' 

回答1:

You need to apply analyzer before indexing on the mapping. This will make sure all documents that are indexed passes through this mapping and all the tags are stripped out before indexing. In your case , you applied the analyzer while querying and this will only affect your search phrase and not the data you search.

You can read more on creating mapping here

I dont believe there is format like this -

http://localhost:9200/foo/test/_search?tokenizer=standard&char_filters=html_strip&q=title 

Rather if you can set the analyzer as follows , it should work fine -

curl -XPUT "http://localhost:9200/foo " -d' {   "foo": {     "settings": {       "analysis": {         "analyzer": {           "html_analyzer": {             "type": "custom",             "tokenizer": "standard",             "filter": [               "standard",               "lowercase",               "stop",               "asciifolding"             ],             "char_filter": [               "html_strip"             ]           },           "whitespace_analyzer": {             "type": "custom",             "tokenizer": "whitespace",             "filter": [               "standard",               "lowercase",               "stop",               "asciifolding"             ]           }         }       }     },     "mappings": {       "test": {         "properties": {           "content": {             "type": "string",             "analyzer": "html_analyzer"           }         }       }     }   } }' 

Here i made the analyzer common for indexing and searching



标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!