String Distance Matrix in Python using pdist

前端 未结 3 517
温柔的废话
温柔的废话 2020-12-31 20:00

How to calculate Jaro Winkler distance matrix of strings in Python?

I have a large array of hand-entered strings (names and record numbers) and I\'m trying to find d

3条回答
  •  天涯浪人
    2020-12-31 20:41

    You need to wrap the distance function, like I demonstrated in the following example with the Levensthein distance

    import numpy as np    
    from Levenshtein import distance
    from scipy.spatial.distance import pdist, squareform
    
    # my list of strings
    strings = ["hello","hallo","choco"]
    
    # prepare 2 dimensional array M x N (M entries (3) with N dimensions (1)) 
    transformed_strings = np.array(strings).reshape(-1,1)
    
    # calculate condensed distance matrix by wrapping the Levenshtein distance function
    distance_matrix = pdist(transformed_strings,lambda x,y: distance(x[0],y[0]))
    
    # get square matrix
    print(squareform(distance_matrix))
    
    Output:
    array([[ 0.,  1.,  4.],
           [ 1.,  0.,  4.],
           [ 4.,  4.,  0.]])
    

提交回复
热议问题