Solr wildcard query with whitespace

纵然是瞬间 提交于 2019-11-26 21:29:07

问题


I have a wildcard query that looks something like:

q=location:los a*

I'd like it to match "los angeles" and "los altos". A query like:

q=los*

Works just fine, but as soon as I add whitespace I get no results. How I can use whitespace in my wildcard queries?


回答1:


I've recently come across this problem myself, and it seems that all you need to do is escape the space in your query. Your original query would be interpreted by Solr as something like this:

location:los id:a*

(assuming "id" is your default search field)

However, if you were to write your query as:

location:los\ a*

Then it would end up being parsed as:

location:los a*

And the above should yield the results that you desire (assuming your data is properly indexed).

Tip: Figuring all this out is simple. Just add &debugQuery=on to the end of the url you use when submitting your query to see how it was parsed by Solr.




回答2:


Solution for your problem using complex query parser:

q={!complexphrase inOrder=true}location:"los a*"

To know more about Complex phrase query parser, checkout this link! https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-ComplexPhraseQueryParser




回答3:


Without seeing your config, I would say use a KeywordTokenizerFactory as you probably tokenize on whitespace now.




回答4:


Might I suggest the solr prefix query plugin if you are only using it for wildcards on the suffix as we were http://lucene.apache.org/solr/4_0_0/solr-core/org/apache/solr/search/PrefixQParserPlugin.html

example usage

http://localhost:8983/solr/collection/select?q={!prefix%20f=name}Bob%20Smi

would match "Bob Smith" or "Bob Smit" but not convert into a check of ("Bob" OR "Smi*") as would happen if you used the first solution you might consider along the lines of q=name:Bob%20Smi*

Hopefully this is of some help to you or someone else looking for a simple solution because I was banging my head against a wall for hours before I found this!




回答5:


The query (assuming you have whitespace tokenizer): q=location:los a* means that you search document with word "los" and a word that starts with "a"

Solr (as much that I know) cannot determine if one word (or term) appear before another.




回答6:


I think you should use the config like this

  <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
     <analyzer type="index">
        <tokenizer class="solr.KeywordTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory" />
        <filter class="solr.PatternReplaceFilterFactory" pattern="(\s+)" replacement=""   replace="all" />
    </analyzer>
  </fieldType>

and you have to handle your input keyword for search as remove whitespace




回答7:


For me worked

<fieldtype name="text_like" class="solr.TextField">
    <analyzer type="index">
        <tokenizer class="solr.NGramTokenizerFactory" minGramSize="3" maxGramSize="1000"/>
        <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
    <analyzer type="query">
        <tokenizer class="solr.LowerCaseTokenizerFactory"/>
    </analyzer>
</fieldtype>

and query field:*some\ phrase* (in java literal one needs to escape \ as \\).




回答8:


I had the same problem in my project. When ever I was search for a word along with the whitespace I was not geting the result. So I replaced the whitespace with a hyphen "-" while indexing and querying. Below is the schema.xml snipet which I used to do so:

<fieldType name="text_ci" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.NGramTokenizerFactory" minGramSize="2" maxGramSize="250"/>
<filter class="solr.LowerCaseFilterFactory"/>
  <filter class="solr.TrimFilterFactory" />
    <filter class="solr.PatternReplaceFilterFactory"
            pattern="([/\s+])" replacement="-" replace="all"
    />
</analyzer>
<analyzer type="query">
<tokenizer class="solr.EdgeNGramTokenizerFactory" minGramSize="2" maxGramSize="250"/>
<filter class="solr.LowerCaseFilterFactory"/>
 <filter class="solr.TrimFilterFactory" />
    <filter class="solr.PatternReplaceFilterFactory"
            pattern="([/\s+])" replacement="-" replace="all"
    />
</analyzer>
</fieldType>


来源:https://stackoverflow.com/questions/10023133/solr-wildcard-query-with-whitespace

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!