How to calculate Levenshtein ratio/distance for rows in my column in python?

╄→尐↘猪︶ㄣ 提交于 2019-12-25 02:46:20

问题


I have a dataframe with only one column , and 1000 rows in that column. I need to compare all rows and find Levenshtein distance for all rows . how Do i calculate that ratio or distance in python

I have a dataframe as following:

  #Df 
  StepDescription
  click confirm button when done
  you have logged on
  please log in to proceed
  click on confirm button
  Dolb was released successfully
  Enter your details
  validate the statement
  Aval was released sucessfully

How to do i Calculate Levenshtein ration for all these

Code I have written to iterate through loops but after iterating how to proceed.

  import Levenshtein
  import pandas as pd
  data_dist = pd.read_csv('path\Data_TestDescription.csv')
  df = pd.DataFrame(data_dist)
  for index, row in df.iterrows():

回答1:


As asked in a comment, the percentage is desired, I'll keep the accepteds answer and add just the new part:

import numpy as np
import pandas as pd
from Levenshtein import distance
from itertools import product

#df = ...

dist = [distance(*x) for x in product(df.StepDescription, repeat=2)]

dist_df = pd.DataFrame(np.array(dist).reshape(df.shape[0], df.shape[0]))
dist_df

    0   1   2   3   4   5   6   7
0   0  23  23  13  29  25  25  28
1  23   0  18  18  23  18  18  23
2  23  18   0  20  25  21  19  24
3  13  18  20   0  27  19  21  26
4  29  23  25  27   0  26  23   5
5  25  18  21  19  26   0  19  25
6  25  18  19  21  23  19   0  21
7  28  23  24  26   5  25  21   0

dist_df_percentage = dist_df // min(x for x in dist if x > 0) * 100

     0    1    2    3    4    5    6    7
0    0  460  460  260  580  500  500  560
1  460    0  360  360  460  360  360  460
2  460  360    0  400  500  420  380  480
3  260  360  400    0  540  380  420  520
4  580  460  500  540    0  520  460  100
5  500  360  420  380  520    0  380  500
6  500  360  380  420  460  380    0  420
7  560  460  480  520  100  500  420    0



回答2:


Finally after lots of example I tried I got exact ratio or percentage using fuzzratio

from itertools import product
import numpy as np
import difflib
from fuzzywuzzy import fuzz
from fuzzywuzzy import process
import editdistance
dist = np.empty(df.shape[0]**2, dtype=int) 
for i, x in enumerate(product(df.Stepdescription, repeat=2)): 
    dist[i] = fuzz.ratio(*x)
dist_df = pd.DataFrame(dist.reshape(-1, df.shape[0]))
out_csv= dist_df.to_csv('FuzzyRatio.csv', sep='\t')


来源:https://stackoverflow.com/questions/47152344/how-to-calculate-levenshtein-ratio-distance-for-rows-in-my-column-in-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!