How to calculate Jaro Winkler distance matrix of strings in Python?
I have a large array of hand-entered strings (names and record numbers) and I\'m trying to find d
For anyone with a similar problem - One solution I just found is to extract the relevant code from the pdist function and add a [0] to the jaro_winkler function input to call the string out of the numpy array.
Example:
X = np.asarray(fname, order='c')
s = X.shape
m, n = s
dm = np.zeros((m * (m - 1)) // 2, dtype=np.double)
k = 0
for i in xrange(0, m - 1):
for j in xrange(i + 1, m):
dm[k] = jaro_winkler(X[i][0], X[j][0])
k = k + 1
dms = squareform(dm)
Even though this algorithm works I'd still like to learn if there's a "right" computer-sciency-way to do this with the pdist function. Thanks, and hope this helps someone!