Remove email address from solr indexing

亡梦爱人 提交于 2019-12-24 21:55:27

问题


When Solr build the index, it gets parts of email address.

For exemple, if i have an email like this : foo@bar.com, Solr indexes the words "foo" and "barcom".

I want to remove theses words but I don't know how to do this. I tried to modify the configuration file schema.xml adding this rule on my indexed field :

<filter class="solr.PatternReplaceFilterFactory" pattern=" (.*)@(.*) " replacement=" " replace="all"/>

However, it doesn't work.


回答1:


You can detect tokens as a e-mailaddress and blacklist them using

  <fieldType name="emails" class="solr.TextField" sortMissingLast="true" omitNorms="true">
  <analyzer>
    <tokenizer class="solr.UAX29URLEmailTokenizerFactory"/>
    <filter class="solr.TypeTokenFilterFactory" types="email_type.txt" useWhitelist="true"/>
  </analyzer>
</fieldType>


来源:https://stackoverflow.com/questions/20655719/remove-email-address-from-solr-indexing

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!