levenshtein-distance

Fastest general purpose Levenshtein Javascript implementation

生来就可爱ヽ(ⅴ<●) 提交于 2019-12-10 19:01:47
问题 I am looking for a good general purpose Levenshtein implementation in Javascript. It must be fast and be useful for short and long strings. It should also be used many times (hence the caching). The most important thing is that it calculates a plain simple Levenshtein distance. I came up with this: var levenshtein = (function() { var row2 = []; return function(s1, s2) { if (s1 === s2) { return 0; } else { var s1_len = s1.length, s2_len = s2.length; if (s1_len && s2_len) { var i1 = 0, i2 = 0,

Calculating the complexity of Levenshtein Edit Distance

我的未来我决定 提交于 2019-12-10 16:48:17
问题 I have been looking at this simple python implementation of Levenshtein Edit Distance for all day now. def lev(a, b): """Recursively calculate the Levenshtein edit distance between two strings, a and b. Returns the edit distance. """ if("" == a): return len(b) # returns if a is an empty string if("" == b): return len(a) # returns if b is an empty string return min(lev(a[:-1], b[:-1])+(a[-1] != b[-1]), lev(a[:-1], b)+1, lev(a, b[:-1])+1) From: http://www.clear.rice.edu/comp130/12spring

Modify Levenshtein-Distance to ignore order

岁酱吖の 提交于 2019-12-10 15:08:12
问题 I'm looking to compute the the Levenshtein-distance between sequences containing up to 6 values. The order of these values should not affect the distance. How would I implement this into the iterative or recursive algorithm? Example: # Currently >>> LDistance('dog', 'god') 2 # Sorted >>> LDistance('dgo', 'dgo') 0 # Proposed >>> newLDistance('dog', 'god') 0 'dog' and 'god' have the exact same letters, sorting the strings before hand will return the desired result. However this doesn't work all

C# LevenshteinDistance algorithm for spellchecker

好久不见. 提交于 2019-12-10 12:23:22
问题 Hi i'm using the levenshtein algorithm to calculate the difference between two strings, using the below code. It currently provides the total number of changes which need to be made to get from 'answer' to 'target', but i'd like to split these up into the types of errors being made. So classifying an error as a deletion, substitution or insertion. I've tried adding a simple count but i'm new at this and don't really understand how the code works so not sure how to go about it. static class

URL path similarity/string similarity algorithm

巧了我就是萌 提交于 2019-12-09 18:55:51
问题 My problem is that I need to compare URL paths and deduce if they are similar. Below I provide example data to process: # GROUP 1 /robots.txt # GROUP 2 /bot.html # GROUP 3 /phpMyAdmin-2.5.6-rc1/scripts/setup.php /phpMyAdmin-2.5.6-rc2/scripts/setup.php /phpMyAdmin-2.5.6/scripts/setup.php /phpMyAdmin-2.5.7-pl1/scripts/setup.php /phpMyAdmin-2.5.7/scripts/setup.php /phpMyAdmin-2.6.0-alpha/scripts/setup.php /phpMyAdmin-2.6.0-alpha2/scripts/setup.php # GROUP 4 //phpMyAdmin/ I tried Levenshtein

python-Levenshtein ratio calculation

烈酒焚心 提交于 2019-12-08 20:02:58
I have the following two strings: a = 'bjork gudmundsdottir' b = 'b. gudmundsson gunnar' The Levenshtein distance between the two is 12 . When I use the following formula for Levenshtein distance, I get a discrepancy of 0.01 with the python-Levenshtein library: >>> Ldist / max(len( a ), len( b )) >>> float(12)/21 0.5714285714285714 # python-Levenshtein Levenshtein.ratio(a,b) 0.5853658536585366 # difflib >>> seq=difflib.SequenceMatcher(a=a,b=b) >>> seq.ratio() 0.5853658536585366 What accounts for this difference? What am I doing incorrectly in my calculation. Note that I have reviewed this How

Data structure for retrieving strings that are close by Levenshtein distance

懵懂的女人 提交于 2019-12-08 17:26:27
问题 For example, starting with the set of english words, is there a structure/algorithm that allows one fast retrieval of strings such as "light" and "tight", using the word "right" as the query? I.e., I want to retrieve strings with small Levenshtein distance to the query string. 回答1: The BK-tree data structure might be appropriate here. It's designed to efficiently support queries of the form "what are all words within edit distance k or less from a query word?" Its performance guarantees are

levenshtein distance with items in list in python

99封情书 提交于 2019-12-08 11:42:58
问题 I have two list, below, and i want to compare if words that are similar levenshtein distance of less than 2. I have a function to find the levenshtein distance, however as parameters it needs the two words. I can find which words are not in the other list, but it is not helping. And I can go index by index but as in the case below when i get to index 7 (but and except) everything is thrown off because infidelity will be index 9 and 8 and wcop88 is 9 and 10 hence those won't be compare. Is

Fuzzy matching multiple words in string

▼魔方 西西 提交于 2019-12-08 07:33:17
问题 I'm trying to employ the help of the Levenshtein Distance to find fuzzy keywords(static text) on an OCR page. To do this, I want to give a percentage of errors that are allowed (say, 15%). string Keyword = "past due electric service"; Since the keyword is 25 characters long, I want to allow for 4 errors (25 * .15 rounded up) I need to be able to compare it to... string Entire_OCR_Page = "previous bill amount payment received on 12/26/13 thank you! current electric service total balances

Fuzzy string matching using Levenshtein algorithm in Elasticsearch

时光毁灭记忆、已成空白 提交于 2019-12-08 06:44:50
问题 I have just started exploring Elasticsearch. I created a document as follows: curl -XPUT "http://localhost:9200/cities/city/1" -d' { "name": "Saint Louis" }' I now tried do a fuzzy search on the name field with a Levenshtein distance of 5 as follows : curl -XGET "http://localhost:9200/_search " -d' { "query": { "fuzzy": { "name" : { "value" : "St. Louis", "fuzziness" : 5 } } } }' But its not returning any match. I expect the Saint Louis record to be returned. How can i fix my query ? Thanks.