KeywordAnalyzer and LowerCaseFilter/LowerCaseTokenizer

后端 未结 2 643
死守一世寂寞
死守一世寂寞 2020-12-19 12:11

I want to build my own analyzer that uses both filters/tokenizers.

I mean, the same field is Keyword (entire stream as a single token) and lowercase

相关标签:
2条回答
  • 2020-12-19 12:59

    This should work:

    public final class YourAnalyzer extends ReusableAnalyzerBase { 
    
      @Override
      protected TokenStreamComponents createComponents(final String fieldName, final Reader reader) {
        final TokenStream source = new KeywordTokenizer(reader);
        return new TokenStreamComponents(source, new LowercaseFilter(Version.LUCENE_36, source));
      }
    }
    
    0 讨论(0)
  • 2020-12-19 13:05

    In Lucene 3.6.2 it must look like this:

    import org.apache.lucene.analysis.KeywordAnalyzer;
    import org.apache.lucene.analysis.KeywordTokenizer;
    import org.apache.lucene.analysis.LowerCaseFilter;
    import org.apache.lucene.analysis.LowerCaseTokenizer;
    import org.apache.lucene.analysis.ReusableAnalyzerBase;
    import org.apache.lucene.analysis.Tokenizer;
    import org.apache.lucene.util.Version;
    
    public class YourAnalyzer extends ReusableAnalyzerBase {
    
        private final Version version;
    
        public YourAnalyzer(final Version version) {
            super();
            this.version = version;
        }
    
        @Override
        protected TokenStreamComponents createComponents(final String fieldName, final Reader reader) {
            final Tokenizer source = new KeywordTokenizer(reader);
            return new TokenStreamComponents(source, new LowerCaseFilter(this.version, source));
        }
    
    }
    
    0 讨论(0)
提交回复
热议问题