Fuzzy search algorithm (approximate string matching algorithm)

后端未结

关注

 6  1615

予麋鹿 2020-12-22 17:36

I wish to create a fuzzy search algorithm. However, upon hours of research I am really struggling.

I want to create an algorithm that performs a fuzzy search on a l

6条回答

甜味超标 (楼主)

2020-12-22 17:48

I wrote an article about how I implemented a fuzzy search:

https://medium.com/@Srekel/implementing-a-fuzzy-search-algorithm-for-the-debuginator-cacc349e6c55

The implementation is in Github and is in the public domain, so feel free to have a look.

https://github.com/Srekel/the-debuginator/blob/master/the_debuginator.h#L1856

The basics of it is: Split all strings you'll be searching for into parts. So if you have paths, then "C:\documents\lol.txt" is maybe "C", "documents", "lol", "txt".

Ensure you lowercase these strings to ensure that you it's case insensitive. (Maybe only do it if the search string is all-lowercase).

Then match your search string against this. In my case I want to match it regardless of order, so "loldoc" would still match the above path even though "lol" comes after "doc".

The matching needs to have some scoring to be good. The most important part I think is consecutive matching, so the more characters directly after one another that match, the better. So "doc" is better than "dcm".

Then you'll likely want to give extra score for a match that's at the start of a part. So you get more points for "doc" than "ocu".

In my case I also give more points for matching the end of a part.

And finally, you may want to consider giving extra points for matching the last part(s). This makes it so that matching the file name/ending scores higher than the folders leading up to it.

0 讨论(0)

查看其它6个回答
发布评论:

提交评论
- 加载中...