I want to find out if you strings are almost similar. For example, string like \'Mohan Mehta\' should match \'Mohan Mehte\' and vice versa. Another example, string like \'Um
What you want is a string distance. There many flavors, but I would recommend starting with the Levenshtein distance.
You can use difflib.sequencematcher if you want something from the stdlib:
from difflib import SequenceMatcher
s_1 = 'Mohan Mehta'
s_2 = 'Mohan Mehte'
print(SequenceMatcher(a=s_1,b=s_2).ratio())
0.909090909091
fuzzywuzzy is one of numerous libs that you can install, it uses the difflib module with python-Levenshtein. You should also check out the wikipage on Approximate_string_matching
you might want to look at NLTK (The Natural Language Toolkit), specifically the nltk.metrics package, which implements various string distance algorithms, including the Levenshtein distance mentioned already.
You could split the string and check to see if it contains at least one first/last name that is correct.
Another approach is to use a "phonetic algorithm":
A phonetic algorithm is an algorithm for indexing of words by their pronunciation.
For example using the soundex algorithm:
>>> import soundex
>>> s = soundex.getInstance()
>>> s.soundex("Umesh Gupta")
'U5213'
>>> s.soundex("Umash Gupte")
'U5213'
>>> s.soundex("Umesh Gupta") == s.soundex("Umash Gupte")
True
// calculate the similarity between 2 strings
public static double similarity(String s1, String s2) {
String longer = s1, shorter = s2;
if (s1.length() < s2.length()) { // longer should always have greater length
longer = s2; shorter = s1;
}
int longerLength = longer.length();
if (longerLength == 0) { return 1.0; /* both strings are zero length */ }
/* // If you have StringUtils, you can use it to calculate the edit distance:
return (longerLength - StringUtils.getLevenshteinDistance(longer, shorter)) /
(double) longerLength; */
return (longerLength - editDistance(longer, shorter)) / (double) longerLength;
}
// Example implementation of the Levenshtein Edit Distance
// See http://rosettacode.org/wiki/Levenshtein_distance#Java
public static int editDistance(String s1, String s2) {
s1 = s1.toLowerCase();
s2 = s2.toLowerCase();
int[] costs = new int[s2.length() + 1];
for (int i = 0; i <= s1.length(); i++) {
int lastValue = i;
for (int j = 0; j <= s2.length(); j++) {
if (i == 0)
costs[j] = j;
else {
if (j > 0) {
int newValue = costs[j - 1];
if (s1.charAt(i - 1) != s2.charAt(j - 1))
newValue = Math.min(Math.min(newValue, lastValue),
costs[j]) + 1;
costs[j - 1] = lastValue;
lastValue = newValue;
}
}
}
if (i > 0)
costs[s2.length()] = lastValue;
}
return costs[s2.length()];
}