String Distance Matrix in Python using pdist

前端 未结 3 532
温柔的废话
温柔的废话 2020-12-31 20:00

How to calculate Jaro Winkler distance matrix of strings in Python?

I have a large array of hand-entered strings (names and record numbers) and I\'m trying to find d

3条回答
  •  余生分开走
    2020-12-31 20:26

    Here's a concise solution that requires neither numpy nor scipy:

    from Levenshtein import jaro_winkler
    data = ['Bob','Carl','Kristen','Calr', 'Doug']
    dm = [[ jaro_winkler(a, b) for b in data] for a in data]
    print('\n'.join([''.join([f'{item:6.2f}' for item in row]) for row in dm]))
    
      1.00  0.00  0.00  0.00  0.53
      0.00  1.00  0.46  0.93  0.00
      0.00  0.46  1.00  0.46  0.00
      0.00  0.93  0.46  1.00  0.00
      0.53  0.00  0.00  0.00  1.00
    

提交回复
热议问题