levenshtein-distance

Finding Levenshtein distance on two string

ぐ巨炮叔叔 提交于 2019-12-24 07:48:57
问题 I am trying to implement in Eclipse Java Levenshtein distance on the following two strings: I took the idea from Wikipedia, but I don't know why my output is wrong, I need help to find my mistake/s. "kruskal" "causal" package il.ac.oranim.alg2016; public class OPT { public static void main(String[] args) { char[] t={'k','r','u','s','k','a','l'}; char[] s={'c','a','u','s','a','l'}; for (int i=0;i<=s.length;i++) { for (int j=0;j<=t.length;j++) System.out.print(LevenshteinDistance(s,t)[i][j]+" "

python - calculate orthographic similarity between words of a list

牧云@^-^@ 提交于 2019-12-24 02:13:19
问题 I need to calculate orthographic similarity (edit/Levenshtein distance) among words in a given corpus. As Kirill suggested below, I tried to do the following: import csv, itertools, Levenshtein import numpy as np # import the list of words from csv file path = '/Users/my path' file = path + 'file.csv' with open(file, 'rb') as f: reader = csv.reader(f) wordlist = list(reader) wordlist = np.array(wordlist) #make it a np array wordlist2 = wordlist[:,0] #subset the first column of the imported

Levenshtein distance in Swift3

余生颓废 提交于 2019-12-24 01:39:14
问题 I'm using a tutorial from Rosetta Code to calculate Levenshtein distance. It seems their code is in Swift2 so I get this error Binary operator '+' cannot be applied to operands of type '[Int]' and 'Repeated<String.CharacterView>' when doing this: var cur = [i + 2] + empty where let empty = repeatElement(s, count: 0) . How can I go about this? 回答1: There were a couple of changes to make. The construction of the Array empty. enumerate() is now enumerated() successor() doesn't exist anymore so I

Levenshtein distance in Swift3

天涯浪子 提交于 2019-12-24 01:39:12
问题 I'm using a tutorial from Rosetta Code to calculate Levenshtein distance. It seems their code is in Swift2 so I get this error Binary operator '+' cannot be applied to operands of type '[Int]' and 'Repeated<String.CharacterView>' when doing this: var cur = [i + 2] + empty where let empty = repeatElement(s, count: 0) . How can I go about this? 回答1: There were a couple of changes to make. The construction of the Array empty. enumerate() is now enumerated() successor() doesn't exist anymore so I

How to modify Levenshtein algorithm, to know if it inserted, deleted, or substituted a character?

江枫思渺然 提交于 2019-12-23 12:25:29
问题 So I am trying to devise a spin off of the Levenshtein algorithm, where I keep track of what transformations I did in the string(inserted a, or substitute a for b). Example: Basically, say I am computing the edit distance of "bbd" and "bcd" The edit distance will be 1 and the transformation will be "substitude b for c" Question: How would I approach this problem since the implementations i've seen do not concern themselves with knowing what kind of operation it was but only the total cost?

Optimize speed of Levenshtein distance of many words

廉价感情. 提交于 2019-12-23 01:28:39
问题 I have a cell array dictionary which contains a lot of words (ca. 15000). I want to compute the function strdist (to calculate the Levenshtein distance) for all the couples of words. I tried in two ways, but they are both really slow. What can be a more efficient solution? Here is my code (dict_keys is my cell array of length m): 1) matrix = sparse(m,m); for i = 1:m-1; matrix(i,:) = cellfun( @(u) strdist(dict_keys{i},u), dict_keys ); end 2) matrix = sparse(m,m); for i = 1:m-1; for j = i+1:m

Get the most repeated similar fields in MySQL database

…衆ロ難τιáo~ 提交于 2019-12-22 15:42:02
问题 Let's assume we have a database like: Actions_tbl: -------------------------------------------------------- id | Action_name | user_id| -------------------------------------------------------- 1 | John reads one book | 1 2 | reading the book by john | 1 3 | Joe is jumping over fire | 2 4 | reading another book | 2 5 | John reads the book in library | 1 6 | Joe read a book | 2 7 | read a book | 3 8 | jumping with no reason is Ronald's habit| 3 Users_tbl: ----------------------- user_id | user

Levenshtein distance symmetric?

一笑奈何 提交于 2019-12-22 04:18:41
问题 I was informed Levenshtein distance is symmetric. When I used google's diffMatchPatch tool which computes Levenshtein distance among other things, the results don't imply Levenshtein distance is symmetric. i.e Levenshtein(x1,x2) is not equal to Levenshtein(x2,x1). Is Levenshtein not symmetric or is there a problem with that particular implementation? Thanks. 回答1: Just looking at the basic algorithm it definitely is symmetric given the same cost for the operations - the number of additions,

How to call Levenshtien Function using the values from two different tables in T-SQL

假装没事ソ 提交于 2019-12-22 01:42:11
问题 I am trying to find the Levenshtien distance between the columns of two different tables TableA and TableB. Basically I need to match ColumnA of TableA with all the elements of ColumnB in TableB and find the Levenshtien Distance I have created a Levenshtien Function as follows CREATE FUNCTION [Levenshtein] (@value1 [NVARCHAR](MAX), @value2 [NVARCHAR](MAX)) RETURNS [INT] AS EXTERNAL NAME [FastenshteinAssembly].[Fastenshtein.Levenshtein].[Distance] GO This is basically calling a Levenshtien dll

PHP - Compare multidimensional sub-arrays to each other and merge on similarity threshold

风流意气都作罢 提交于 2019-12-21 22:57:41
问题 Introduction - This question has been updated the 27th May 2018: I have 1 PHP multidimensional-array, containing 6 sub-arrays, each containing 20 sub-sub-arrays, which in turn, each contain 2 sub-sub-arrays, one being a string (header), the other being an unspecified number of keywords (keywords). I am looking to compare each of the 120 sub-sub-arrays to the 100 other sub-sub-arrays contained in the remainint 5 sub-arrays. So that sub-sub-array 1 in sub-array 1 is compared to sub-array 1 to