问题
The fieldType config descrived in this question works for me to detect currency (eg. docs containing "$30" ). However, we wish to use the StandardTokenizerFactory, rather than the WhiteSpaceTokenizerFactory - and this config returns false positives with the StandardTokenizerFactory (eg. docs containing the digits 30 without the $ symbol). What is the solution?
Thanks
How do I find documents containing digits and dollar signs in Solr?
回答1:
Solved via a post to the solr user group http://lucene.472066.n3.nabble.com/How-to-use-the-StandardTokenizer-with-currency-td4308072.html#a4308097
Here is my config
<!-- VB - Just like text_general, but supports $ currency matching and autoGeneratePhraseQueries -->
<fieldType name="text_curr_3" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true">
<analyzer type="index">
<charFilter class="solr.MappingCharFilterFactory" mapping="mapping.txt"/>
<charFilter class="solr.PatternReplaceCharFilterFactory" pattern="\$" replacement="xxdollarxx"/>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="xxdollarxx" replacement="\$" replace="all"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1" types="word-delim-types.txt" />
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<charFilter class="solr.MappingCharFilterFactory" mapping="mapping.txt"/>
<charFilter class="solr.PatternReplaceCharFilterFactory" pattern="\$" replacement="xxdollarxx"/>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="xxdollarxx" replacement="\$" replace="all"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1" types="word-delim-types.txt" />
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
来源:https://stackoverflow.com/questions/40877567/using-standardtokenizerfactory-with-currency