MySQL - Which Hash Algo should I use for this?

后端 未结 1 526
南旧
南旧 2020-12-20 09:20

I have a big rhyme database with 360000 words (entries). Every word has a category (for example: \'sheet\' and \'meet\' have the category \'eet\'). A query to find suitable

1条回答
  •  鱼传尺愫
    2020-12-20 09:53

    Perhaps you could implement the levenshtein algorithm into mysql as a stored function, below is an example, hope it helps:

    DELIMITER //
    CREATE FUNCTION levenshtein( s1 VARCHAR(255), s2 VARCHAR(255) )
      RETURNS INT
      DETERMINISTIC
      BEGIN
        DECLARE s1_len, s2_len, i, j, c, c_temp, cost INT;
        DECLARE s1_char CHAR;
        -- max strlen=255
        DECLARE cv0, cv1 VARBINARY(256);
        SET s1_len = CHAR_LENGTH(s1), s2_len = CHAR_LENGTH(s2), cv1 = 0x00, j = 1, i = 1, c = 0;
        IF s1 = s2 THEN
          RETURN 0;
        ELSEIF s1_len = 0 THEN
          RETURN s2_len;
        ELSEIF s2_len = 0 THEN
          RETURN s1_len;
        ELSE
          WHILE j <= s2_len DO
            SET cv1 = CONCAT(cv1, UNHEX(HEX(j))), j = j + 1;
          END WHILE;
          WHILE i <= s1_len DO
            SET s1_char = SUBSTRING(s1, i, 1), c = i, cv0 = UNHEX(HEX(i)), j = 1;
            WHILE j <= s2_len DO
              SET c = c + 1;
              IF s1_char = SUBSTRING(s2, j, 1) THEN 
                SET cost = 0; ELSE SET cost = 1;
              END IF;
              SET c_temp = CONV(HEX(SUBSTRING(cv1, j, 1)), 16, 10) + cost;
              IF c > c_temp THEN SET c = c_temp; END IF;
                SET c_temp = CONV(HEX(SUBSTRING(cv1, j+1, 1)), 16, 10) + 1;
                IF c > c_temp THEN 
                  SET c = c_temp; 
                END IF;
                SET cv0 = CONCAT(cv0, UNHEX(HEX(c))), j = j + 1;
            END WHILE;
            SET cv1 = cv0, i = i + 1;
          END WHILE;
        END IF;
        RETURN c;
      END; 
    

    Source http://www.artfulsoftware.com/infotree/queries.php#552 (Fixed by adding DELIMITER //)

    Test example script

    dbhost = $host;
            $this->dbname = $dbname;
            $this->dbuser = $user;
            $this->dbpass = $pass;
        }
    
        private function connect(){
            if (!$this->db instanceof PDO){
                $this->db = new PDO('mysql:dbname='.$this->dbname.';host='.$this->dbhost, $this->dbuser, $this->dbpass);
                $this->db->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
            }
        }
    
        //A Model method for the levenshtein_query.
        public function levenshtein_query($word,$dist){
            $this->connect();
            $sql = "SELECT `word` FROM `words` WHERE levenshtein( :word ,`word` ) BETWEEN 0 AND $dist";
            $statement = $this->db->prepare($sql);
            $statement->bindParam(':word', $word, PDO::PARAM_STR);
            $statement->execute();
            return $statement->fetchAll(PDO::FETCH_ASSOC);
        }
    }
    
    //ini the model class
    $model = new DB('localhost','test_db','root','');
    
    //The Word posted
    $word = 'eet';
    $result = $model->levenshtein_query($word,1);
    
    print_r($result);
    /*
    //The Result
    Array
    (
        [0] => Array
            (
                [word] => bet
            )
    
        [1] => Array
            (
                [word] => get
            )
    
        [2] => Array
            (
                [word] => jet
            )
    
        [3] => Array
            (
                [word] => let
            )
    
        [4] => Array
            (
                [word] => met
            )
    
        [5] => Array
            (
                [word] => pet
            )
    
        [6] => Array
            (
                [word] => set
            )
    
        [7] => Array
            (
                [word] => vet
            )
    
        [8] => Array
            (
                [word] => wet
            )
    
        [9] => Array
            (
                [word] => yet
            )
    
        [10] => Array
            (
                [word] => meet
            )
    
    )
    
    */
    

    Perhaps its of some interest...

    0 讨论(0)
提交回复
热议问题