Solr wildcards and escaped characters together

问题

I am trying to search in solr but have a problem. For example i have this fraze, stored in solr: [Karina K[arina ? ! & ?!a& m.malina m:malina 0sal0 0 AND. Now i want to search any request with wildcards *. For example i write *[* or *?* and solr return me this fraze. But it doesn't work. What i tried:

i can use escaped characters like this K\[arina, but in this case i need to enter all phrase enter image description here
But if i write K\[arin*, i wioll have no results enter image description here
Okey, i tried K\[arin\*, and it is worked enter image description here
Okey, then i put * at start \*\[arina and it is ok enter image description here
And finally \*\[arin\* doesnt work. Why? Where the logik? enter image description here
Somewhere i read, that i can use " for example "*\[arin*" or even *[arin*, but not enter image description here
And interesting, that K\[arina like the whole word i can search, or \?\!a\&, but \? i can not.

回答1:

When searching wildcards the regular analysis chain will not be invoked, unless the filter configured is MultiTermAware. That means that you're switching the search behavior around without knowing what's happening behind the scenes.

Lucene and Solr operates on tokens - tokens are usually single words (after some processing) from the input character stream, split ("tokenized") on different characters depending on what the tokenizer for the field is. A tokenizer will usually split on most non-alphanumeric characters, and defining the analysis chain explicitly will allow you to get the behavior you're looking for.

I'm guessing your tokenizer splits on most of the special characters you have in your string, so that K[arina effectively ends up being indexed as K and arina.

K\\[arina => K, arina (split on \ and [)

No token matching:

K\[arin* => nothing happens, since there is no token starting with K[arin

Escaping the wildcard means that the whole string gets sent to the tokenizer, effectively not making it a wildcard search, but a search with a string containing * instead:

K\[arin\* => K, arin -> K matches (and arin if an ngram filter is attached)
(one of your later examples show that there is no ngram filter)

Same behavior here, escaping the asterisk means that the whole string gets sent to the tokenizer instead of a wildcard search happening:

\*\[arina => arina -> arina matches

And when no tokens matches:

\*\[arin\* => arin -> there is no token matching arin, only arina.

Case 6 is meant for phrases, which is tokens with whitespace between them being searched as a single match. I'll skip that for now.

The last case is effectively ending up with an empty search, since the tokenizer will split on ? and leaving no usable tokens. Your first example on that line leaves the expected tokens, K and arina:

K\[arina => K, arina
\?\!a\& => a
\? => <nothing>

来源：https://stackoverflow.com/questions/60540445/solr-wildcards-and-escaped-characters-together

标签

solr

full-text-search