I am trying to match a single search term against a dictionary of possible matches using a Levenshtein distance algorithm. The algorithm returns a distance expressed as numb
Maximum number of levenshtein distance is [l1, l2].max. I think it is true. But we shouldn't divide by it.
gem install levenshtein diff-lcs
Diff::LCS.lcs "abc", "qwer"
=> []
Levenshtein.distance("abc", "qwer").to_f / [3, 4].max
=> 1.0
Diff::LCS.lcs "abc", "cdef"
=> ["c"]
Levenshtein.distance("abc", "cdef").to_f / [3, 4].max
=> 1.0
Diff::LCS.lcs "1234", "34567890"
=> ["3", "4"]
Levenshtein.distance("1234", "34567890").to_f / [4, 8].max
=> 1.0
Levenshtein doesn't look like reliable way to compare strings in percents. I don't want to treat similar strings as 100% different.
I can recommend just to analyze diff between each sequence and LCS.
def get_similarity(sequence_1, sequence_2)
lcs_length = Diff::LCS::Internals.lcs(sequence_1, sequence_2).compact.length
lcs_length.to_f * 2 / (sequence_1.length + sequence_2.length)
end