Implementation of string pattern matching using Suffix Array and LCP(-LR)

前端 未结 4 1007
太阳男子
太阳男子 2020-12-16 00:00

During the last weeks I tried to figure out how to efficiently find a string pattern within another string.

I found out that for a long time, the most efficient way

4条回答
  •  情话喂你
    2020-12-16 00:23

    The termin that can help you: enchanced suffix array, which is used to describe suffix array with various other arrays in order to replace suffix tree (lcp, child).

    These can be some of the examples:

    https://code.google.com/p/esaxx/ ESAXX

    http://bibiserv.techfak.uni-bielefeld.de/mkesa/ MKESA

    The esaxx one seems to be doing what you want, plus, it has example enumSubstring.cpp how to use it.


    If you take a look at the referenced paper, it mentions an useful property (4.2). Since SO does not support math, there is no point to copy it here.

    I've done quick implementation, it uses segment tree:

    // note that arrSize is O(n)
    // int arrSize = 2 * 2 ^ (log(N) + 1) + 1; // start from 1
    
    // LCP = new int[N];
    // fill the LCP...
    // LCP_LR = new int[arrSize];
    // memset(LCP_LR, maxValueOfInteger, arrSize);
    // 
    
    // init: buildLCP_LR(1, 1, N);
    // LCP_LR[1] == [1..N]
    // LCP_LR[2] == [1..N/2]
    // LCP_LR[3] == [N/2+1 .. N]
    
    // rangeI = LCP_LR[i]
    //   rangeILeft  = LCP_LR[2 * i]
    //   rangeIRight = LCP_LR[2 * i + 1]
    // ..etc
    void buildLCP_LR(int index, int low, int high)
    {
        if(low == high)
        {
            LCP_LR[index] = LCP[low];
            return;
        }
    
        int mid = (low + high) / 2;
    
        buildLCP_LR(2*index, low, mid);
        buildLCP_LR(2*index+1, mid + 1, high);
    
        LCP_LR[index] = min(LCP_LR[2*index], LCP_LR[2*index + 1]);
    }
    

提交回复
热议问题