问题
I have a wildcard query that looks something like:
q=location:los a*
I'd like it to match "los angeles" and "los altos". A query like:
q=los*
Works just fine, but as soon as I add whitespace I get no results. How I can use whitespace in my wildcard queries?
回答1:
I've recently come across this problem myself, and it seems that all you need to do is escape the space in your query. Your original query would be interpreted by Solr as something like this:
location:los id:a*
(assuming "id" is your default search field)
However, if you were to write your query as:
location:los\ a*
Then it would end up being parsed as:
location:los a*
And the above should yield the results that you desire (assuming your data is properly indexed).
Tip: Figuring all this out is simple. Just add &debugQuery=on
to the end of the url you use when submitting your query to see how it was parsed by Solr.
回答2:
Solution for your problem using complex query parser:
q={!complexphrase inOrder=true}location:"los a*"
To know more about Complex phrase query parser, checkout this link! https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-ComplexPhraseQueryParser
回答3:
Without seeing your config, I would say use a KeywordTokenizerFactory as you probably tokenize on whitespace now.
回答4:
Might I suggest the solr prefix query plugin if you are only using it for wildcards on the suffix as we were http://lucene.apache.org/solr/4_0_0/solr-core/org/apache/solr/search/PrefixQParserPlugin.html
example usage
http://localhost:8983/solr/collection/select?q={!prefix%20f=name}Bob%20Smi
would match "Bob Smith" or "Bob Smit" but not convert into a check of ("Bob" OR "Smi*") as would happen if you used the first solution you might consider along the lines of q=name:Bob%20Smi*
Hopefully this is of some help to you or someone else looking for a simple solution because I was banging my head against a wall for hours before I found this!
回答5:
The query (assuming you have whitespace tokenizer): q=location:los a* means that you search document with word "los" and a word that starts with "a"
Solr (as much that I know) cannot determine if one word (or term) appear before another.
回答6:
I think you should use the config like this
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.PatternReplaceFilterFactory" pattern="(\s+)" replacement="" replace="all" />
</analyzer>
</fieldType>
and you have to handle your input keyword for search as remove whitespace
回答7:
For me worked
<fieldtype name="text_like" class="solr.TextField">
<analyzer type="index">
<tokenizer class="solr.NGramTokenizerFactory" minGramSize="3" maxGramSize="1000"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.LowerCaseTokenizerFactory"/>
</analyzer>
</fieldtype>
and query field:*some\ phrase*
(in java literal one needs to escape \ as \\).
回答8:
I had the same problem in my project. When ever I was search for a word along with the whitespace I was not geting the result. So I replaced the whitespace with a hyphen "-" while indexing and querying. Below is the schema.xml snipet which I used to do so:
<fieldType name="text_ci" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.NGramTokenizerFactory" minGramSize="2" maxGramSize="250"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.TrimFilterFactory" />
<filter class="solr.PatternReplaceFilterFactory"
pattern="([/\s+])" replacement="-" replace="all"
/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.EdgeNGramTokenizerFactory" minGramSize="2" maxGramSize="250"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.TrimFilterFactory" />
<filter class="solr.PatternReplaceFilterFactory"
pattern="([/\s+])" replacement="-" replace="all"
/>
</analyzer>
</fieldType>
来源:https://stackoverflow.com/questions/10023133/solr-wildcard-query-with-whitespace