Solr/Lucene fuzzy search too slow

给你一囗甜甜゛ 提交于 2019-12-02 07:20:32

Your problem is not related to the analyzer that you use. When you search for Califrna~0.7 Lucene iterates over all terms in index and calculates the (Levenshtein) edit distance between "Califrna" and all terms. This is a very expensive operation.

This issue will be solved with Lucene version 4.0. Lucene version that comes with Solr is using old brute force approach unfortunately.

https: //issues.apache.org/jira/browse/LUCENE-2089

http: //java.dzone.com/news/lucenes-fuzzyquery-100-times

If it is OK for you, I would suggest to download Solr/Lucene from trunk and test how the new fuzzy query works.

http://wiki.apache.org/solr/NightlyBuilds

Even though trunk is stable it is not recommended for production use. I can suggest you two similar methods:

1 - SpellChecker

http://wiki.apache.org/solr/SpellCheckComponent

http ://www.lucidimagination.com/blog/2010/08/31/getting-started-spell-checking-with-apache-lucene-and-solr/

SpellChecker builds its small index with n-grams in order to perform fast lookup. It is also using Levenshtein distance but instead of iterating on all terms it only calculates the distance on related terms.

You need to first execute spell checker for "Califrna" and it will suggest you "Californa". Then you can use "California" in your query on your main index without fuzzy query.

2- Auto Suggest

http ://wiki.apache.org/solr/Suggester

You can offer the correct spelling as user type query with suggester component. This will be a lot faster. It support fuzzy search with JaspellLookup class. JaspellLookup needs to be updated in order to enable fuzzy search. Wiki does not say much about what needs to be updated though. if usePrefix is set to false it should perform fuzzy lookup I guess.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!