Solr: Partial email search with exact match

大兔子大兔子 提交于 2019-12-24 15:33:47

问题


I'm currently developing a search, where users need to search people by their first name, last name or their email. For the search I'm using Solr 4.0.0-ALPHA and edismax query.

The problem I am having is that if a user were to search user with a partial email I would need to return only the matches that match exactly that partial email query.

For example query: lastname@gmail

should return only users that match "lastname@gmail".

For example: firstname.lastname@gmail.com

but now instead it matches all that match either "lastname" or "gmail" which in our database will be huge number of results, when there only is one that would match the "lastname@gmail". I know that I can get the exact match if I do a query in double quotes, like "lastname@gmail" and I can of course force the email address to this format on the client before sending the search to Solr, but is it possible to do this somehow in schema.xml.

Here is my current schema.xml

<schema name="example" version="1.5">
    <fields>
        <field name="id" type="string" indexed="true" stored="true" required="true" />
        <field name="firstName" type="string_ci" indexed="true" stored="true" />
        <field name="lastName" type="string_ci" indexed="true" stored="true" />
        <field name="email" type="string_email" indexed="true" stored="true" />
    </fields>

    <uniqueKey>id</uniqueKey>

    <types>
        <fieldType name="string" class="solr.StrField" sortMissingLast="true" />

        <fieldType name="string_ci" class="solr.TextField" sortMissingLast="true" omitNorms="true">
            <analyzer>
                <tokenizer class="solr.KeywordTokenizerFactory" />
                <filter class="solr.LowerCaseFilterFactory" />
            </analyzer>
        </fieldType>

        <fieldType name="string_email" class="solr.TextField" sortMissingLast="true" omitNorms="true">
            <analyzer>
                <tokenizer class="solr.StandardTokenizerFactory" />
                <filter class="solr.LowerCaseFilterFactory" />
                <filter class="solr.WordDelimiterFilterFactory" />
            </analyzer>
        </fieldType>
    </types>
</schema>

I know that the issue is here that I'm using StandardTokenizerFactory, which splits the email address into tokens and when doing the query it parses the query like this:

<str name="parsedquery_toString">
+(lastName:lastname@gmail | id:lastname@gmail | (email:lastname email:gmail) | firstName:lastname@gmail)
</str>

Where I would want it to do more like this, which happens when I do the query with double quotes "lastname@gmail":

<str name="parsedquery_toString">
+(lastName:lastname@gmail | id:lastname@gmail | email:"lastname gmail" | firstName:lastname@gmail)
</str>

Here is the search I'm doing:

/select?q=lastname@gmail&qf=id+firstName+lastName+email&defType=edismax&debugQuery=true


回答1:


And from #solr irc-channel I got the answer how to solve this properly. By adding autoGeneratePhraseQueries=true to the field it put the query to double quotes and I got the correct answer.

<fieldType name="text_email" class="solr.TextField" sortMissingLast="true" omitNorms="true" autoGeneratePhraseQueries="true">


来源:https://stackoverflow.com/questions/12101639/solr-partial-email-search-with-exact-match

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!