Solr wildcards and escaped characters together

梦想的初衷 提交于 2021-01-29 14:49:05

问题


I am trying to search in solr but have a problem. For example i have this fraze, stored in solr: [Karina K[arina ? ! & ?!a& m.malina m:malina 0sal0 0 AND. Now i want to search any request with wildcards *. For example i write *[* or *?* and solr return me this fraze. But it doesn't work. What i tried:

  1. i can use escaped characters like this K\[arina, but in this case i need to enter all phrase enter image description here
  2. But if i write K\[arin*, i wioll have no results enter image description here
  3. Okey, i tried K\[arin\*, and it is worked enter image description here
  4. Okey, then i put * at start \*\[arina and it is ok enter image description here
  5. And finally \*\[arin\* doesnt work. Why? Where the logik? enter image description here
  6. Somewhere i read, that i can use " for example "*\[arin*" or even *[arin*, but not enter image description here
  7. And interesting, that K\[arina like the whole word i can search, or \?\!a\&, but \? i can not.

回答1:


When searching wildcards the regular analysis chain will not be invoked, unless the filter configured is MultiTermAware. That means that you're switching the search behavior around without knowing what's happening behind the scenes.

Lucene and Solr operates on tokens - tokens are usually single words (after some processing) from the input character stream, split ("tokenized") on different characters depending on what the tokenizer for the field is. A tokenizer will usually split on most non-alphanumeric characters, and defining the analysis chain explicitly will allow you to get the behavior you're looking for.

I'm guessing your tokenizer splits on most of the special characters you have in your string, so that K[arina effectively ends up being indexed as K and arina.

K\\[arina => K, arina (split on \ and [)

No token matching:

K\[arin* => nothing happens, since there is no token starting with K[arin

Escaping the wildcard means that the whole string gets sent to the tokenizer, effectively not making it a wildcard search, but a search with a string containing * instead:

K\[arin\* => K, arin -> K matches (and arin if an ngram filter is attached)
(one of your later examples show that there is no ngram filter)

Same behavior here, escaping the asterisk means that the whole string gets sent to the tokenizer instead of a wildcard search happening:

\*\[arina => arina -> arina matches

And when no tokens matches:

\*\[arin\* => arin -> there is no token matching arin, only arina.

Case 6 is meant for phrases, which is tokens with whitespace between them being searched as a single match. I'll skip that for now.

The last case is effectively ending up with an empty search, since the tokenizer will split on ? and leaving no usable tokens. Your first example on that line leaves the expected tokens, K and arina:

K\[arina => K, arina
\?\!a\& => a
\? => <nothing>


来源:https://stackoverflow.com/questions/60540445/solr-wildcards-and-escaped-characters-together

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!