What is the best algorithm for matching two string containing less than 10 words in latin script

前端 未结 5 1968
刺人心
刺人心 2021-02-04 09:35

I\'m comparing song titles, using Latin script (although not always), my aim is an algorithm that gives a high score if the two song titles seem to be the same same title and a

5条回答
  •  感动是毒
    2021-02-04 10:26

    Did you take a look at the levenshtein distance ?

    int org.apache.commons.lang.StringUtils.getLevenshteinDistance(String s, String t)
    

    Find the Levenshtein distance between two Strings.

    This is the number of changes needed to change one String into another, where each change is a single character modification (deletion, insertion or substitution).

    The previous implementation of the Levenshtein distance algorithm was from http://www.merriampark.com/ld.htm

    Chas Emerick has written an implementation in Java, which avoids an OutOfMemoryError which can occur when my Java implementation is used with very large strings. This implementation of the Levenshtein distance algorithm is from http://www.merriampark.com/ldjava.htm

    Anyway, I'm curious to know what do you choose in this case !

提交回复
热议问题