Python Fuzzy Matching (FuzzyWuzzy) - Keep only Best Match

后端 未结 3 1669
渐次进展
渐次进展 2020-12-08 23:36

I\'m trying to fuzzy match two csv files, each containing one column of names, that are similar but not the same.

My code so far is as follows:

impor         


        
3条回答
  •  北荒
    北荒 (楼主)
    2020-12-09 00:32

    fuzzywuzzy's process.extract() returns the list in reverse sorted order , with the best match coming first.

    so to find just the best match, you can set the limit argument as 1 , so that it only returns the best match, and if that is greater than 60 , you can write it to the csv, like you are doing now.

    Example -

    from fuzzywuzzy import process
    ## For each row in the lookup compute the partial ratio
    for row in parse_csv("names_2.csv"):
    
        for found, score, matchrow in process.extract(row, data, limit=1):
            if score >= 60:
                print('%d%% partial match: "%s" with "%s" ' % (score, row, found))
                Digi_Results = [row, score, found]
                writer.writerow(Digi_Results)
    

提交回复
热议问题