问题
Does anyone know of a MySQL implementation of the Damerau–Levenshtein distance algorithm as a stored procedure/function that takes a single specified string as a parameter and looks for fuzzy matches of the string in a particular field within a particular table?
I have found various procedure/function code examples that compares two specified strings and works out the distance, but firstly this is only the Levenshtein distance algorithm, and not the Damerau–Levenshtein one, and secondly, I'm not looking to compare two strings but find fuzzy matches in a field of my choosing that are similar to my specified string.
I'm basically trying to put together a fuzzy keyword searcher in MySQL.
回答1:
In MySQL Levenshtein and Damerau-Levenshtein UDF’s you have several implementations of this algorithm.
回答2:
There is an ongoing development in Github to modify Sean Collins code so it has UTF-8 support and is case-insensitive.
Example:
mysql> select damlevlim('camión', 'çamion', 6);
+--------------------------------------+
| damlevlim('camión', 'çamion', 6) |
+--------------------------------------+
| 0 |
+--------------------------------------+
1 row in set (0.00 sec)
This is specially useful when doing fuzzy matches.
mysql> select word,damlevlim(word, 'camion') as dist from wordslist where damlevlim(word, 'camion', 7)<1 limit 2;
+--------+------+
| word | dist |
+--------+------+
| camión | 0 |
| camios | 1 |
+--------+------+
2 row in set (0.00 sec)
回答3:
This seems to be an old topic, however should anyone look for a MYSQL implementation of Damerau-Levenshtein distance, here is my own implementation (based upon a simple Levenshtein found elsewhere on this site), which works fine for strings less than 255 characters long. The third parameter can be set to FALSE to retrieve the basic Levenshtein distance:
CREATE FUNCTION levenshtein( s1 VARCHAR(255), s2 VARCHAR(255), dam BOOL)
RETURNS INT
DETERMINISTIC
BEGIN
DECLARE s1_len, s2_len, i, j, c, c_temp, cost INT;
DECLARE s1_char, s2_char CHAR;
-- max strlen=255
DECLARE cv0, cv1, cv2 VARBINARY(256);
SET s1_len = CHAR_LENGTH(s1), s2_len = CHAR_LENGTH(s2), cv1 = 0x00, j = 1, i = 1, c = 0;
IF s1 = s2 THEN
RETURN 0;
ELSEIF s1_len = 0 THEN
RETURN s2_len;
ELSEIF s2_len = 0 THEN
RETURN s1_len;
ELSE
WHILE j <= s2_len DO
SET cv1 = CONCAT(cv1, UNHEX(HEX(j))), j = j + 1;
END WHILE;
WHILE i <= s1_len DO
SET s1_char = SUBSTRING(s1, i, 1), c = i, cv0 = UNHEX(HEX(i)), j = 1;
WHILE j <= s2_len DO
SET c = c + 1;
SET s2_char = SUBSTRING(s2, j, 1);
IF s1_char = s2_char THEN
SET cost = 0; ELSE SET cost = 1;
END IF;
SET c_temp = CONV(HEX(SUBSTRING(cv1, j, 1)), 16, 10) + cost;
IF c > c_temp THEN SET c = c_temp; END IF;
SET c_temp = CONV(HEX(SUBSTRING(cv1, j+1, 1)), 16, 10) + 1;
IF c > c_temp THEN SET c = c_temp; END IF;
IF dam THEN
IF i>1 AND j>1 AND s1_char = SUBSTRING(s2, j-1, 1) AND s2_char = SUBSTRING(s1, i-1, 1) THEN
SET c_temp = CONV(HEX(SUBSTRING(cv2, j-1, 1)), 16, 10) + 1;
IF c > c_temp THEN SET c = c_temp; END IF;
END IF;
END IF;
SET cv0 = CONCAT(cv0, UNHEX(HEX(c))), j = j + 1;
END WHILE;
IF dam THEN SET CV2 = CV1; END IF;
SET cv1 = cv0, i = i + 1;
END WHILE;
END IF;
RETURN c;
END
来源:https://stackoverflow.com/questions/14232851/damerau-levenshtein-distance-algorithm-in-mysql-as-a-function