问题
When Solr build the index, it gets parts of email address.
For exemple, if i have an email like this : foo@bar.com, Solr indexes the words "foo" and "barcom".
I want to remove theses words but I don't know how to do this. I tried to modify the configuration file schema.xml adding this rule on my indexed field :
<filter class="solr.PatternReplaceFilterFactory" pattern=" (.*)@(.*) " replacement=" " replace="all"/>
However, it doesn't work.
回答1:
You can detect tokens as a e-mailaddress and blacklist them using
<fieldType name="emails" class="solr.TextField" sortMissingLast="true" omitNorms="true">
<analyzer>
<tokenizer class="solr.UAX29URLEmailTokenizerFactory"/>
<filter class="solr.TypeTokenFilterFactory" types="email_type.txt" useWhitelist="true"/>
</analyzer>
</fieldType>
来源:https://stackoverflow.com/questions/20655719/remove-email-address-from-solr-indexing