问题
I'm having trouble with the ElasticSearch Grails Plugin, namely the highlighting Feature.
It is returning text with HTML tags, which would not be a big problem, but it is returning broken, cut-off HTML tags as well.
i.e. "href=google.de> Link <a"
Those can't be easily filtered out with a RegEx.
The solution to this seems to be a custom analyzer like this:
'{
"index" : {
"analysis" : {
"analyzer" : {
"test_1" : {
"char_filter" : [
"html_strip"
],
"tokenizer" : "standard"
},
"test_2" : {
"filter" : [
"standard",
"lowercase",
"stop",
"asciifolding"
],
"char_filter" : [
"html_strip"
],
"tokenizer" : "standard"
}
}
}
}
}'
From HTML Strip in Elastic Search
The question is how do i get the above into the GRAILS elasticsearch plugin ? (or any other solution for that matter)
来源:https://stackoverflow.com/questions/42004164/how-to-filter-out-broken-html-tags-in-elasticsearchs-highlights