How to perform a wildcard search in Lucene

本小妞迷上赌 提交于 2019-12-06 11:54:29

问题


I know that Lucene has extensive support for wildcard searches and I know you can search for things like:

Stackover* (which will return Stackoverflow)

That said, my users aren't interested in learning a query syntax. Can Lucene perform this type of wildcard search using an out-of-box Analyzer? Or should I append "*" to every search query?


回答1:


Doing this with string manipulations is tricky to get right, especially since the QueryParser supports boosting, phrases, etc.

You could use a QueryVisitor that rewrites TermQuery into PrefixQuery.

public class PrefixRewriter : QueryVisitor {
    protected override Query VisitTermQuery(TermQuery query) {
        var term = query.GetTerm();
        var newQuery = new PrefixQuery(term);
        return CopyBoost(query, newQuery);
    }
}

The QueryVisitor class can be found at A QueryVisitor for Lucene.

Update a few years later:

The blog post is 404 since long time ago, but the source still lives! It can nowadays be found on github.




回答2:


If you are considering turning every query into a wildcard, I would ask myself these questions:

  1. Is Lucene the best tool for the job? by default wildcards rewrite to constant-score queries, which means you are throwing away relevance ranking completely and no longer "searching" but instead "matching". Perhaps for your application a search engine library is not the best solution and another tool (e.g. database) would be better.
  2. If the answer to #1 is still 'yes', then I would recommend taking a look at what the exact relevance problem is that you are trying to solve. For example, if its that you want queries to match compound or stemmed words, maybe instead add a decompounder or stemmer to your analysis chain instead. You can also consider using an n-gram indexing technique as another alternative.



回答3:


If I want to do something like that I normally format the term before searching e.g.

searchTerm = QueryParser.EscapesearchTerm);
if(!searchTerm.EndsWith(" "))
{
    searchTerm = string.Format("{0}*", searchTerm);
}

which will escape any special characters people have put in. and if the term doesnt ends with a space appends a * on the end. Since * on its own would cause a parsing exception.



来源:https://stackoverflow.com/questions/5746809/how-to-perform-a-wildcard-search-in-lucene

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!