Rabin Karp string matching algorithm

前端 未结 2 1183
醉酒成梦
醉酒成梦 2020-12-16 15:45

I\'ve seen this Rabin Karp string matching algorithm in the forums on the website and I\'m interested in trying to implement it but I was wondering If anyone could tell me w

2条回答
  •  眼角桃花
    2020-12-16 15:50

    About the magic numbers Paul's answer is pretty clear.

    As far as the code is concerned, Rabin Karp's principal idea is to perform an hash comparison between a sliding portion of the string and the pattern.

    The hash cannot be computed each time on the whole substrings, otherwise the computation complexity would be quadratic O(n^2) instead of linear O(n).

    Therefore, a rolling hash function is applied, such as at each iteration only one character is needed to update the hash value of the substring.

    So, let's comment your code:

    for (int i = 0; i < B.Length; i++)
    {
        siga = (siga * D + (ulong)A[i]) % Q;
        sigb = (sigb * D + (ulong)B[i]) % Q;
    }
    if (siga == sigb)
    {
        Console.WriteLine(string.Format(">>{0}<<{1}", A.Substring(0, B.Length), A.Substring(B.Length)));
        return;
    }
    

    ^ This piece computes the hash of pattern B (sigb), and the hashcode of the initial substring of A of the same length of B. Actually it's not completely correct because hash can collide¹ and so, it is necessary to modify the if statement : if (siga == sigb && A.Substring(0, B.Length) == B).

    ulong pow = 1;
    for (int k = 1; k <= B.Length - 1; k++)
        pow = (pow * D) % Q;
    

    ^ Here's computed pow that is necessary to perform the rolling hash.

    for (int j = 1; j <= A.Length - B.Length; j++)
    {
        siga = (siga + Q - pow * (ulong)A[j - 1] % Q) % Q;
        siga = (siga * D + (ulong)A[j + B.Length - 1]) % Q;
        if (siga == sigb)
        {
            if (A.Substring(j, B.Length) == B)
            {
                Console.WriteLine(string.Format("{0}>>{1}<<{2}", A.Substring(0, j),
                                                                    A.Substring(j, B.Length),
                                                                    A.Substring(j + B.Length)));
                return;
            }
        }
    }
    

    ^ Finally, the remaining string (i.e. from the second character to end), is scanned updating the hash value of the A substring and compared with the hash of B (computed at the beginning).

    If the two hashes are equal, the substring and the pattern are compared¹ and if they're actually equal a message is returned.


    ¹ Hash values can collide; hence, if two strings have different hash values they're definitely different, but if the two hashes are equal they can be equal or not.

提交回复
热议问题