Longest common substring via suffix array: do we really need unique sentinels?

烂漫一生 提交于 2019-12-08 04:40:59

问题


I am reading about LCP arrays and their use, in conjunction with suffix arrays, in solving the "Longest common substring" problem. This video states that the sentinels used to separate individual strings must be unique, and not be contained in any of the strings themselves.

Unless I am mistaken, the reason for this is so when we construct the LCP array (by comparing how many characters adjacent suffixes have in common) we don't count the sentinel value in the case where two sentinels happen to be at the same index in both the suffixes we are comparing.

This means we can write code like this:

for each character c in the shortest suffix
    if suffix_1[c] == suffix_2[c]
        increment count of common characters

However, in order to facilitate this, we need to jump through some hoops to ensure we use unique sentinels, which I asked about here.

However, would a simpler (to implement) solution not be to simply count the number of characters in common, stopping when we reach the (single, unique) sentinel character, like this:

set sentinel = '#'
for each character c in the shortest suffix
    if suffix_1[c] == suffix_2[c]
        if suffix_1[c] != sentinel
            increment count of common characters
        else
            return

Or, am I missing something fundamental here?

来源:https://stackoverflow.com/questions/57711769/longest-common-substring-via-suffix-array-do-we-really-need-unique-sentinels

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!