String manipulation: calculate the “similarity of a string with its suffixes”

前端 未结 2 1738
挽巷
挽巷 2020-12-16 07:01

For two strings A and B, we define the similarity of the strings to be the length of the longest prefix common to both strings. For example, the similarity of strings \"abc\

相关标签:
2条回答
  • 2020-12-16 07:11

    Read this link about z-algorithm first. An O(n) solution based on the algorithm from the link implemented on python:

    def z_func(s):
        z = [0]*len(s)
        l, r = 0, 0
        for i in range(1,len(s)):
            if i<=r:
                z[i] = min(r-i+1, z[i-l])
            while (i + z[i] < len(s) and s[z[i]] == s[i + z[i]]):
                z[i] += 1
            if z[i]+i-1 > r:
                l, r = i, z[i]+i-1
        return sum(z)+len(s)
    
    0 讨论(0)
  • 2020-12-16 07:17

    You want to consider suffix arrays. The suffix array of a word is the array of the indices of suffixes sorted in lexicographical order. In the linked wikipedia article, the algorithms compute the LCP (longest common prefix) as they compute the suffix array. You can compute this in O(n) using similarities with suffix trees, as shown in this paper.

    EXAMPLE: Your string is ababaa, so the suffix array looks like this:

    5 | a
    4 | aa
    2 | abaa
    0 | ababaa
    3 | baa
    1 | babaa
    

    where the number on the left is the index at which the suffix begins. Now it's pretty each to compute prefixes since everything is stored lexicographically.

    As a side note, this is closely related to the longest common substring problem. To practice for your next interview, think about ways to solve that efficiently.

    0 讨论(0)
提交回复
热议问题