Lightweight fuzzy search library

依然范特西╮ 提交于 2019-11-28 20:43:46

Lucene is very scalable—which means its good for little applications too. You can create an index in memory very quickly if that's all you need.

For fuzzy searching, you really need to decide what algorithm you'd like to use. With information retrieval, I use an n-gram technique with Lucene successfully. But that's a special indexing technique, not a "library" in itself.

Without knowing more about your application, it won't be easy to recommend a suitable library. How much data are you searching? What format is the data? How often is the data updated?

I'm not sure how well Lucene is suited for fuzzy searching, the custom library would be better choice. For example, this search is done in Java and works pretty fast, but it is custom made for such task: http://www.softcorporation.com/products/people/

Soundex is very 'English' in it's encoding - Daitch-Mokotoff works better for many names, especially European (Germanic) and Jewish names. In my UK-centric world, it's what I use.

Wiki here.

You didn't specify your development platform, but if its PHP then suggest you look at the ZEND Lucene lubrary :

http://ifacethoughts.net/2008/02/07/zend-brings-lucene-to-php/ http://framework.zend.com/manual/en/zend.search.lucene.html

As it LAMP its far lighter than Lucene on Java, and can easily be extended for other filetypes, provided you can find a conversion library or cmd line converter - there are lots of OSS solutions around to do this.

Try Walnutil - based on Lucene API - integrated to SQL Server and Oracle DBs . You can create any type of index and then use it. For simple search you can use some methods from walnutilsoft, for more complicated search cases you can use Lucene API. See web based example where was used indexes created from Walnutil Tools. Also you can see some code example written on Java and C# which you can use it for creating different type of search. This tools is free. http://www.walnutilsoft.com/

If you can choose to use a database, I recommend using PostgreSQL and its fuzzy string matching functions.

If you can use Ruby, I suggest looking into the amatch library.

@aku - links to working soundex libraries are right there at the bottom of the page.

As for Levenshtein distance, the Wikipedia article on that also has implementations listed at the bottom.

Check this link out.It uses the levenshtein distance metrics but is much faster. http://narenonit.blogspot.com/2012/07/fuzzy-matching-autocomplete-library.html

alexandru.topliceanu

A powerful, lightweight solution is sphinx.

It's smaller then Lucene and it supports disambiguation.

It's written in c++, it's fast, battle-tested, has libraries for every env and it's used by large companies, like craigslists.org

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!