Lucene - Wildcards in phrases

前端 未结 7 997
小鲜肉
小鲜肉 2021-01-04 11:50

I am currently attempting to use Lucene to search data populated in an index.

I can match on exact phrases by enclosing it in brackets (i.e. \"Processing Documents\"

7条回答
  •  予麋鹿
    予麋鹿 (楼主)
    2021-01-04 12:04

    Another alternative is to use NGrams and specifically the EdgeNGram. http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.EdgeNGramFilterFactory

    This will create indexes for ngrams or parts of words. Documents, with a min ngram size of 5 and max ngram size of 8, would index: Docum Docume Document Documents

    There is a bit of a tradeoff for index size and time. One of the Solr books quotes as a rough guide: Indexing takes 10 times longer Uses 5 times more disk space Creates 6 times more distinct terms.

    However, the EdgeNGram will do better than that.

    You do need to make sure that you don't submit wildcard character in your queries. As you aren't doing a wildcard search, you are matching a search term on ngrams(parts of words).

提交回复
热议问题