Using a Combination of Wildcards and Stemming

前端 未结 4 1884
闹比i
闹比i 2020-12-30 09:38

I\'m using a snowball analyzer to stem the titles of multiple documents. Everything works well, but their are some quirks.

Example:

A search for \"valv\", \

4条回答
  •  无人及你
    2020-12-30 10:28

    I used 2 different approach to solve this before

    1. Use two fields, one that contain stemmed terms, the other one containing terms generated by say, the StandardAnalyzer. When you parse the search query if its a wildcard search in the "standard" field, if not use the field with stemmed terms. This may be harder to use if you have the user input their queries directly in the Lucene's QueryParser.

    2. Write a custom analyzer and index overlapping tokens. It basically consist of indexing the original term and the stem at the same position in the index using the PositionIncrementAttribute. You can look into SynonymFilter to get some example of how to use the PositionIncrementAttribute correctly.

    I Prefer solution #2.

提交回复
热议问题