levenshtein-distance

How to correct bugs in this Damerau-Levenshtein implementation?

末鹿安然 提交于 2019-11-30 15:58:03
I'm back with another longish question. Having experimented with a number of Python-based Damerau-Levenshtein edit distance implementations, I finally found the one listed below as editdistance_reference() . It seems to deliver correct results and appears to have an efficient implementation. So I set down to convert the code to Cython. on my test data, the reference method manages to deliver results for 11,000 comparisons (for pairs of words aound 12 letters long), while the Cythonized method does over 200,000 comparisons per second. Sadly, the results are incorrect: when you look at the

Levenshtein distance c# count error type

别说谁变了你拦得住时间么 提交于 2019-11-30 14:33:12
I found this bit of code that computes Levenshtein's distance between an answer and a guess: int CheckErrors(string Answer, string Guess) { int[,] d = new int[Answer.Length + 1, Guess.Length + 1]; for (int i = 0; i <= Answer.Length; i++) d[i, 0] = i; for (int j = 0; j <= Guess.Length; j++) d[0, j] = j; for (int j = 1; j <= Guess.Length; j++) for (int i = 1; i <= Answer.Length; i++) if (Answer[i - 1] == Guess[j - 1]) d[i, j] = d[i - 1, j - 1]; //no operation else d[i, j] = Math.Min(Math.Min( d[i - 1, j] + 1, //a deletion d[i, j - 1] + 1), //an insertion d[i - 1, j - 1] + 1 //a substitution );

Get the closest color name depending on an hex-color

扶醉桌前 提交于 2019-11-30 14:17:20
I try to get the most matching color name depending on an given hex-value. For example if we have the hex-color #f00 we've to get the colorname red . '#ff0000' => 'red' '#000000' => 'black' '#ffff00' => 'yellow' I use currently the levenshtein-distance algorithm to get the closest color name, works well so far, but sometimes not as expected. For example: '#0769ad' => 'chocolate' '#00aaee' => 'mediumspringgreen' So any ideas how to get the result closer? Here's what I made to get the closest color: Array.closest = (function () { // http://en.wikibooks.org/wiki/Algorithm_Implementation/Strings

How do diff/patch work and how safe are they?

二次信任 提交于 2019-11-30 13:49:40
问题 Regarding how they work, I was wondering low-level working stuff: What will trigger a merge conflict? Is the context also used by the tools in order to apply the patch? How do they deal with changes that do not actually modify source code behavior? For example, swapping function definition places. Regarding safety, truth be told, the huge Linux kernel repository is a testament for their safety. But I wondering about the following points: Are there any caveats/limitations regarding the tools

Text similarity algorithm

喜你入骨 提交于 2019-11-30 12:09:20
问题 I have two subtitles files. I need a function that tells whether they represent the same text, or the similar text Sometimes there are comments like "The wind is blowing... the music is playing" in one file only. But 80% percent of the contents will be the same. The function must return TRUE (files represent the same text). And sometimes there are misspellings like 1 instead of l (one - L ) as here: She 1eft the baggage . Of course, it means function must return TRUE. My comments: The

Levenshtein distance in regular expression

ぃ、小莉子 提交于 2019-11-30 08:58:58
问题 Is it possible to include Levenshtein distance in a regular expression query? (Except by making union between permutations, like this to search for "hello" with Levenshtein distance 1: .ello | h.llo | he.lo | hel.o | hell. since this is stupid and unusable for larger Levenshtein distances.) 回答1: is there possiblity how to include levenshtein distance in regular expression query? No, not in a sane way. Implementing - or using an existing - Levenshtein distance algorithm is the way to go. 回答2:

How do diff/patch work and how safe are they?

给你一囗甜甜゛ 提交于 2019-11-30 08:44:31
Regarding how they work, I was wondering low-level working stuff: What will trigger a merge conflict? Is the context also used by the tools in order to apply the patch? How do they deal with changes that do not actually modify source code behavior? For example, swapping function definition places. Regarding safety, truth be told, the huge Linux kernel repository is a testament for their safety. But I wondering about the following points: Are there any caveats/limitations regarding the tools that the user should be aware of? Have the algorithms been proven to not generate wrong results? If not,

Optimizing Levenshtein distance algorithm

*爱你&永不变心* 提交于 2019-11-30 07:43:06
I have a stored procedure that uses Levenshtein distance to determine the result closest to what the user typed. The only thing really affecting the speed is the function that calculates the Levenshtein distance for all the records before selecting the record with the lowest distance (I've verified this by putting a 0 in place of the call to the Levenshtein function). The table has 1.5 million records, so even the slightest adjustment may shave off a few seconds. Right now the entire thing runs over 10 minutes. Here's the method I'm using: ALTER function dbo.Levenshtein ( @Source nvarchar(200)

Normalizing the edit distance

大兔子大兔子 提交于 2019-11-30 05:00:47
问题 I have a question that can we normalize the levenshtein edit distance by dividing the e.d value by the length of the two strings? I am asking this because, if we compare two strings of unequal length, the difference between the lengths of the two will be counted as well. for eg: ed('has a', 'has a ball') = 4 and ed('has a', 'has a ball the is round') = 15. if we increase the length of the string, the edit distance will increase even though they are similar. Therefore, I can not set a value,

Where can the documentation for python-Levenshtein be found online? [closed]

跟風遠走 提交于 2019-11-30 04:54:22
I've found a great python library implementing Levenshtein functions (distance, ratio, etc.) at http://code.google.com/p/pylevenshtein/ but the project seems inactive and the documentation is nowhere to be found. I was wondering if anyone knows better than me and can point me to the documentation. You won't have to generate the docs yourself. There's an online copy of the original Python Levenshtein API: http://www.coli.uni-saarland.de/courses/LT1/2011/slides/Python-Levenshtein.html Here is an example: # install with: pip install python-Levenshtein from Levenshtein import distance edit_dist =