Manacher's algorithm (algorithm to find longest palindrome substring in linear time)

前端 未结 10 540
走了就别回头了
走了就别回头了 2020-12-22 16:02

After spending about 6-8 hours trying to digest the Manacher\'s algorithm, I am ready to throw in the towel. But before I do, here is one last shot in the dark: can anyone e

10条回答
  •  盖世英雄少女心
    2020-12-22 16:37

    Full Article: http://www.zrzahid.com/longest-palindromic-substring-in-linear-time-manachers-algorithm/

    First of all lets observe closely to a palindrome in order to find some interesting properties. For example, S1 = "abaaba" and S2="abcba", both are palindrome but what is the non-trivial (i.e. not length or characters) difference between them? S1 is a palindrome centered around the invisible space between i=2 and i=3 (non-existent space!). On the other hand S2 is centered around character at i=2 (ie. c). In order to graciously handle the center of a palindrome irrespective of the odd/even length, lets transform the palindrome by inserting special character $ in between characters. Then S1="abba" and S2="abcba" will be transformed into T1="$a$b$a$a$b$a$" centered at i=6 and T2="$a$b$c$b$a$" centered at i=5. Now, we can see that centers are existent and lengths are consistent 2*n+1, where n=length of original string. For example,

                        i'          c           i           
          -----------------------------------------------------
          | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10| 11| 12|
          ----------------------------------------------------- 
       T1=| $ | a | $ | b | $ | a | $ | a | $ | b | $ | a | $ |
          -----------------------------------------------------
    

    Next, observe that from the symmetric property of a (transformed) palindrome T around the center c, T[c-k] = T[c+k] for 0<= k<= c. That is positions c-k and c+k are mirror to each other. Let's put it another way, for an index i on the right of center c, the mirror index i' is on the left of c such that c-i'=i-c => i'=2*c-i and vice versa. That is,

    For each position i on the right of center c of a palindromic substring, the mirror position of i is, i'=2*c-i, and vice versa.

    Let us define an array P[0..2*n] such that P[i] equals to the length of the palindrome centered at i. Note that, length is actually measured by number of characters in the original string (by ignoring special chars $). Also let min and max be respectively the leftmost and rightmost boundary of a palindromic substring centered at c. So, min=c-P[c] and max=c+P[c]. For example, for palindrome S="abaaba", the transformed palindrome T, mirror center c=6, length array P[0..12], min=c-P[c]=6-6=0, max=c+P[c]=6+6=12 and two sample mirrored indices i and i' are shown in the following figure.

          min           i'          c           i           max
          -----------------------------------------------------
          | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10| 11| 12|
          ----------------------------------------------------- 
        T=| $ | a | $ | b | $ | a | $ | a | $ | b | $ | a | $ |
          -----------------------------------------------------
        P=| 0 | 1 | 0 | 3 | 0 | 5 | 6 | 1 | 0 | 3 | 0 | 1 | 0 |
          -----------------------------------------------------
    

    With such a length array P, we can find the length of longest palindromic substring by looking into the max element of P. That is,

    P[i] is the length of a palindromic substring with center at i in the transformed string T, ie. center at i/2 in the original string S; Hence the longest palindromic substring would be the substring of length P[imax] starting from index (imax-P[imax])/2 such that imax is the index of maximum element in P.

    Let us draw a similar figure in the following for our non-palindromic example string S="babaabca".

                           min              c               max
                           |----------------|-----------------|
          --------------------------------------------------------------------
     idx= | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10| 11| 12| 13| 14| 15| 16|
          --------------------------------------------------------------------- 
        T=| $ | b | $ | a | $ | b | $ | a | $ | a | $ | b | $ | c | $ | a | $ |
          ---------------------------------------------------------------------
        P=| 0 | 1 | 0 | 3 | 0 | 3 | 0 | 1 | 4 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 |
          ---------------------------------------------------------------------
    

    Question is how to compute P efficiently. The symmetric property suggests the following cases that we could potentially use to compute P[i] by using previously computed P[i'] at the mirror index i' on the left, hence skipping a lot of computations. Let's suppose that we have a reference palindrome (first palindrome) to start with.

    1. A third palindrome whose center is within the right side of a first palindrome will have exactly the same length as that of a second palindrome anchored at the mirror center on the left side, if the second palindrome is within the bounds of the first palindrome by at least one character.
      For example in the following figure with the first palindrome centered at c=8 and bounded by min=4 and max=12, length of the third palindrome centered at i=9 (with mirror index i'= 2*c-i = 7) is, P[i] = P[i'] = 1. This is because the second palindrome centered at i' is within the bounds of first palindrome. Similarly, P[10] = P[6] = 0.
      
      
                                            |----3rd----|
                                    |----2nd----|        
                             |-----------1st Palindrome---------|
                             min          i'  c   i           max
                             |------------|---|---|-------------|
            --------------------------------------------------------------------
       idx= | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10| 11| 12| 13| 14| 15| 16|
            --------------------------------------------------------------------- 
          T=| $ | b | $ | a | $ | b | $ | a | $ | a | $ | b | $ | c | $ | a | $ |
            ---------------------------------------------------------------------
          P=| 0 | 1 | 0 | 3 | 0 | 3 | 0 | 1 | 4 | ? | ? | ? | ? | ? | ? | ? | ? |
            ---------------------------------------------------------------------
      
      Now, question is how to check this case? Note that, due to symmetric property length of segment [min..i'] is equals to the length of segment [i..max]. Also, note that 2nd palindrome is completely within 1st palindrome iff left edge of the 2nd palindrome is inside the left boundary, min of the 1st palindrome. That is,
      
              i'-P[i'] >= min
              =>P[i']-i' < -min (negate)
              =>P[i'] < i'-min 
              =>P[i'] < max-i [(max-i)=(i'-min) due to symmetric property].
      
      Combining all the facts in case 1,
      P[i] = P[i'], iff (max-i) > P[i']
    2. If the second palindrome meets or extends beyond the left bound of the first palindrome, then the third palindrome is guaranteed to have at least the length from its own center to the right outermost character of the first palindrome. This length is the same from the center of the second palindrome to the left outermost character of the first palindrome.
      For example in the following figure, second palindrome centered at i=5 extends beyond the left bound of the first palindrome. So, in this case we can't say P[i]=P[i']. But length of the third palindrome centered at i=11, P[i] is at least the length from its center i=11 to the right bound max=12 of first palindrome centered at c. That is, P[i]>=1. This means third palindrome could be extended past max if and only if next immediate character past max matches exactly with the mirrored character, and we continue this check beyond. For example, in this case P[13]!=P[9] and it can't be extended. So, P[i] = 1.
                                                          
                    |-------2nd palindrome------|   |----3rd----|---?    
                             |-----------1st Palindrome---------|
                             min  i'          c           i   max
                             |----|-----------|-----------|-----|
            --------------------------------------------------------------------
       idx= | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10| 11| 12| 13| 14| 15| 16|
            --------------------------------------------------------------------- 
          T=| $ | b | $ | a | $ | b | $ | a | $ | a | $ | b | $ | c | $ | a | $ |
            ---------------------------------------------------------------------
          P=| 0 | 1 | 0 | 3 | 0 | 3 | 0 | 1 | 4 | 1 | 0 | ? | ? | ? | ? | ? | ? |
            ---------------------------------------------------------------------
      
      So, how to check this case? This is simply the failed check for case 1. That is, second palindrome will extend past left edge of first palindrome iff,
      
              i'-P[i'] < min
              =>P[i']-i' >= -min [negate]
              =>P[i'] >= i'-min 
              =>P[i'] >= max-i [(max-i)=(i'-min) due to symmetric property]. 
      
      That is, P[i] is at least (max-i) iff (max-i) P[i]>=(max-i), iff (max-i) Now, if the third palindrome does extend beyond max then we need to update the center and the boundary of the new palindrome.
      If the palindrome centered at i does expand past max then we have new (extended) palindrome, hence a new center at c=i. Update max to the rightmost boundary of the new palindrome.
      Combining all the facts in case 1 and case 2, we can come up with a very beautiful little formulae:
      
              Case 1: P[i] = P[i'],  iff (max-i) > P[i']
              Case 2: P[i]>=(max-i), iff (max-i) = min(P[i'], max-i). 
      
      That is, P[i]=min(P[i'], max-i) when the third palindrome is not extendable past max. Otherwise, we have new third palindrome at center at c=i and new max=i+P[i].
    3. Neither the first nor second palindrome provides information to help determine the palindromic length of a fourth palindrome whose center is outside the right side of the first palindrome.
      That is, we can't determine preemptively P[i] if i>max. That is,
      P[i] = 0, iff max-i < 0
      Combining all the cases, we conclude the formulae:
      P[i] = max>i ? min(P[i'], max-i) : 0. In case we can expand beyond max then we expand by matching characters beyond max with the mirrored character with respect to new center at c=i. Finally when we have a mismatch we update new max=i+P[i].

    Reference: algorithm description in wiki page

提交回复
热议问题