What are some algorithms for comparing how similar two strings are?

后端 未结 5 1470
故里飘歌
故里飘歌 2020-11-30 17:45

I need to compare strings to decide whether they represent the same thing. This relates to case titles entered by humans where abbreviations and other small details may di

5条回答
  •  星月不相逢
    2020-11-30 18:23

    What you're looking for are called String Metric algorithms. There a significant number of them, many with similar characteristics. Among the more popular:

    • Levenshtein Distance : The minimum number of single-character edits required to change one word into the other. Strings do not have to be the same length
    • Hamming Distance : The number of characters that are different in two equal length strings.
    • Smith–Waterman : A family of algorithms for computing variable sub-sequence similarities.
    • Sørensen–Dice Coefficient : A similarity algorithm that computes difference coefficients of adjacent character pairs.

    Have a look at these as well as others on the wiki page on the topic.

提交回复
热议问题