Damerau–Levenshtein distance algorithm in MySQL as a function

☆樱花仙子☆ 提交于 2019-11-30 19:59:35

问题


Does anyone know of a MySQL implementation of the Damerau–Levenshtein distance algorithm as a stored procedure/function that takes a single specified string as a parameter and looks for fuzzy matches of the string in a particular field within a particular table?

I have found various procedure/function code examples that compares two specified strings and works out the distance, but firstly this is only the Levenshtein distance algorithm, and not the Damerau–Levenshtein one, and secondly, I'm not looking to compare two strings but find fuzzy matches in a field of my choosing that are similar to my specified string.

I'm basically trying to put together a fuzzy keyword searcher in MySQL.


回答1:


In MySQL Levenshtein and Damerau-Levenshtein UDF’s you have several implementations of this algorithm.




回答2:


There is an ongoing development in Github to modify Sean Collins code so it has UTF-8 support and is case-insensitive.

Example:

mysql> select damlevlim('camión', 'çamion', 6);

+--------------------------------------+
| damlevlim('camión', 'çamion', 6) |
+--------------------------------------+
|                                    0 |
+--------------------------------------+
1 row in set (0.00 sec)

This is specially useful when doing fuzzy matches.

mysql> select word,damlevlim(word, 'camion') as dist from wordslist where damlevlim(word, 'camion', 7)<1 limit 2;

+--------+------+
| word   | dist |
+--------+------+
| camión |    0 |
| camios |    1 |
+--------+------+
2 row in set (0.00 sec)



回答3:


This seems to be an old topic, however should anyone look for a MYSQL implementation of Damerau-Levenshtein distance, here is my own implementation (based upon a simple Levenshtein found elsewhere on this site), which works fine for strings less than 255 characters long. The third parameter can be set to FALSE to retrieve the basic Levenshtein distance:

CREATE FUNCTION levenshtein( s1 VARCHAR(255), s2 VARCHAR(255), dam BOOL)
RETURNS INT
DETERMINISTIC
BEGIN
    DECLARE s1_len, s2_len, i, j, c, c_temp, cost INT;
    DECLARE s1_char, s2_char CHAR;
    -- max strlen=255
    DECLARE cv0, cv1, cv2 VARBINARY(256);
    SET s1_len = CHAR_LENGTH(s1), s2_len = CHAR_LENGTH(s2), cv1 = 0x00, j = 1, i = 1, c = 0;
    IF s1 = s2 THEN
        RETURN 0;
    ELSEIF s1_len = 0 THEN
        RETURN s2_len;
    ELSEIF s2_len = 0 THEN
        RETURN s1_len;
    ELSE
        WHILE j <= s2_len DO
            SET cv1 = CONCAT(cv1, UNHEX(HEX(j))), j = j + 1;
        END WHILE;
        WHILE i <= s1_len DO
            SET s1_char = SUBSTRING(s1, i, 1), c = i, cv0 = UNHEX(HEX(i)), j = 1;
            WHILE j <= s2_len DO
                SET c = c + 1;
                SET s2_char = SUBSTRING(s2, j, 1);
                IF s1_char = s2_char THEN
                    SET cost = 0; ELSE SET cost = 1;
                END IF;
                SET c_temp = CONV(HEX(SUBSTRING(cv1, j, 1)), 16, 10) + cost;
                IF c > c_temp THEN SET c = c_temp; END IF;
                SET c_temp = CONV(HEX(SUBSTRING(cv1, j+1, 1)), 16, 10) + 1;
                IF c > c_temp THEN SET c = c_temp; END IF;
                IF dam THEN
                    IF i>1 AND j>1 AND s1_char = SUBSTRING(s2, j-1, 1) AND s2_char = SUBSTRING(s1, i-1, 1) THEN
                        SET c_temp = CONV(HEX(SUBSTRING(cv2, j-1, 1)), 16, 10) + 1;
                        IF c > c_temp THEN SET c = c_temp; END IF;
                    END IF;
                END IF;
                SET cv0 = CONCAT(cv0, UNHEX(HEX(c))), j = j + 1;
            END WHILE;
            IF dam THEN SET CV2 = CV1; END IF;
            SET cv1 = cv0, i = i + 1;
        END WHILE;
    END IF;
    RETURN c;
END


来源:https://stackoverflow.com/questions/14232851/damerau-levenshtein-distance-algorithm-in-mysql-as-a-function

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!