Damerau–Levenshtein distance algorithm in MySQL as a function

后端 未结 3 1252
旧巷少年郎
旧巷少年郎 2021-01-06 12:01

Does anyone know of a MySQL implementation of the Damerau–Levenshtein distance algorithm as a stored procedure/function that takes a single specified string as a parameter a

3条回答
  •  旧巷少年郎
    2021-01-06 12:38

    This seems to be an old topic, however should anyone look for a MYSQL implementation of Damerau-Levenshtein distance, here is my own implementation (based upon a simple Levenshtein found elsewhere on this site), which works fine for strings less than 255 characters long. The third parameter can be set to FALSE to retrieve the basic Levenshtein distance:

    CREATE FUNCTION levenshtein( s1 VARCHAR(255), s2 VARCHAR(255), dam BOOL)
    RETURNS INT
    DETERMINISTIC
    BEGIN
        DECLARE s1_len, s2_len, i, j, c, c_temp, cost INT;
        DECLARE s1_char, s2_char CHAR;
        -- max strlen=255
        DECLARE cv0, cv1, cv2 VARBINARY(256);
        SET s1_len = CHAR_LENGTH(s1), s2_len = CHAR_LENGTH(s2), cv1 = 0x00, j = 1, i = 1, c = 0;
        IF s1 = s2 THEN
            RETURN 0;
        ELSEIF s1_len = 0 THEN
            RETURN s2_len;
        ELSEIF s2_len = 0 THEN
            RETURN s1_len;
        ELSE
            WHILE j <= s2_len DO
                SET cv1 = CONCAT(cv1, UNHEX(HEX(j))), j = j + 1;
            END WHILE;
            WHILE i <= s1_len DO
                SET s1_char = SUBSTRING(s1, i, 1), c = i, cv0 = UNHEX(HEX(i)), j = 1;
                WHILE j <= s2_len DO
                    SET c = c + 1;
                    SET s2_char = SUBSTRING(s2, j, 1);
                    IF s1_char = s2_char THEN
                        SET cost = 0; ELSE SET cost = 1;
                    END IF;
                    SET c_temp = CONV(HEX(SUBSTRING(cv1, j, 1)), 16, 10) + cost;
                    IF c > c_temp THEN SET c = c_temp; END IF;
                    SET c_temp = CONV(HEX(SUBSTRING(cv1, j+1, 1)), 16, 10) + 1;
                    IF c > c_temp THEN SET c = c_temp; END IF;
                    IF dam THEN
                        IF i>1 AND j>1 AND s1_char = SUBSTRING(s2, j-1, 1) AND s2_char = SUBSTRING(s1, i-1, 1) THEN
                            SET c_temp = CONV(HEX(SUBSTRING(cv2, j-1, 1)), 16, 10) + 1;
                            IF c > c_temp THEN SET c = c_temp; END IF;
                        END IF;
                    END IF;
                    SET cv0 = CONCAT(cv0, UNHEX(HEX(c))), j = j + 1;
                END WHILE;
                IF dam THEN SET CV2 = CV1; END IF;
                SET cv1 = cv0, i = i + 1;
            END WHILE;
        END IF;
        RETURN c;
    END
    

提交回复
热议问题