kmp | 易学教程

【字符串算法】——重拾KMP与AC自动机

阅读更多关于【字符串算法】——重拾KMP与AC自动机

——这两个玩意是好久以前学习的，现在大都忘记了，重新回忆一遍。　　对于一个字符串S（文本串），我们拥有几个子串（模式串），如何去求出这些子串在S中位置？　　这时候就要用到KMP算法。　　简要构成就不叙述了，直接讲原理：　　eg：　　如下两个串，上为文本串（S），下为模式串（T），我们从头开始匹配。　　S：abcabdabd 　　T：abcabc 　　到了某一位发现不一样：　　S：abcab d abd 　　T：abcab c 　　那么我们就将模式串右移，因为ab两个字母是重复的。　　S：abcabdabd 　　T： abcabc 　　继续匹配…… 　　这个右移的过程是通过next数组实现的，它用来确定失配后变化的位置。　　求next数组的方式：　　很简单，不赘述了。 1 void nextst() 2 { 3 int j=0,k=-1; 4 nexts[0]=-1; 5 while(j<s2.length()){ 6 if(k==-1||s2[j]==s2[k]){ 7 nexts[++j]=++k; 8 } 9 else k=nexts[k]; 10 } 11 } 　　细心发现，这就是个模式串自匹配。剩下的没啥好说的。我懒。模板 1 #include<bits/stdc++.h> 2 using namespace std; 3 const int

KMP算法

阅读更多关于 KMP算法

KMP是在解决查询某一字符串是否在另一个字符串内的匹配问题时，能高效减少重复暴力的匹配过程从而缩短了查询时间，优化了算法的时间复杂度。举个例子，在abaababc中查询有无abab： 0 1 2 3 4 5 6 7 index[] a b a a b a b c s[]　　　　　　　　　　　　//在s[3]!=t[3]处匹配失败，找t中最近的一个与i-1相同的字符 a b a b t[] 0 1 2 3 4 5 6 7 index[] a b a a b a b c s[]　　　　　　　　　　　　　//由于t[0]和s[1]肯定不等，这里就是可优化的地方，找到最近的a是t[0],所以将t[0]与t[2]对齐 a b a b t[] 0 1 2 3 4 5 6 7 index[] a b a a b a b c s[] a b a b t[]　　　　　　　　　　　　//后面s[3]!=t[2],后拉一步匹配成功上述只是简单的道了一个优化处，我们来对比暴力算法从而放大该优点，可以更好理解真正的kmp。暴力算法就是t中从头开始与s[i]一个个匹配，比如i=1匹配到i=6若匹配失败，则下一层循环就是从i=2开始匹配，再重复，i=3.。。。。。当i=1匹配到i=6时，不让i=2，也就是让i不往后退，而是让j，即t【】回溯，这就是kmp的目的。相当于在上述三步中

KMP字符串匹配

阅读更多关于 KMP字符串匹配

首先写出子串的next数组，即前缀和后缀相等的最大值如abaabba,next数组的下标表示子串下标所在的字符串 next[0] = 0 a next[1] = 0 ab next[2] = 1 aba a next[3] = 1 abaa a next[4] = 2 abaab ab next[5] = 0 abaabb next[6] = 0 abaabba 之后对子串和母串进行对比，记录对比的各自的下标值，如果不匹配，则j = next[j-1];然后将下标为j的子串再与之前为匹配的进行比较来源： https://www.cnblogs.com/duansiyue/p/11393749.html

扩展KMP 洛谷P5410 模板

阅读更多关于扩展KMP 洛谷P5410 模板

题目链接： https://www.luogu.org/problem/P5410 题意：给两个字符串a,b，求b对a的每一个后缀的最大前缀长度分析：扩展KMP（又称Z-algorithm算法）裸题该博客讲解的比较好： https://www.luogu.org/blog/lc-2018-Canton/solution-p5410 但他有几个地方讲的有几个问题，主要在情况2里面首先是一开始S[K+L]肯定是在p的后面，这个比较明显然后情况二红蓝绿三条线的序列不是一样的，红绿还是一样的，但是蓝线就不等了，这个自己看一看 #include<bits/stdc++.h> using namespace std; typedef long long ll; const int maxn=1e5+7;//单词间自行添加了符号，稍做扩大 const ll inf=1e18; #define meminf(a) memset(a,0x3f,sizeof(a)) #define mem0(a) memset(a,0,sizeof(a)); char a[maxn],b[maxn]; int nxt[maxn],extend[maxn];//nxt[i]代表b[i...len]和b的最大前缀长度,extend[i]代表a[i...len]和b的最大前缀长度 void getnxt(){

KMP算法详解

阅读更多关于 KMP算法详解

KMP算法，又称模式匹配算法，能快速判断字符串b是否为字符串a的子串。设a的长度为N，b的长度为N，则KMP算法的时间复杂度为O(N+M)。在讲解KMP算法之前，先将一种易懂的解决这类问题的方法：枚举a的每个元素$a_i$,每次枚举时比较$a_i$与$b_1,a_{i+1}$与$b_2$,...,$a_{i+N-1}$与$b_N$是否相等，若全部相等，则b为a的子串。时间复杂度O(NM)；显然这个方法太慢了，因此我们需要KMP算法来更高效地解决这类问题。当然，用Hash也可以解决这类问题，不过用KMP算法会更优一些。若字符串b为a的子串，则显然a中存在至少存在一段字符与b的所有前缀相匹配；若这段字符的长度等于N，则b为a的子串。因此我们定义一个f数组，$f_i$表示a中以i结尾子串与b的前缀匹配的最长长度。如何进行匹配呢？若当前以i结尾的长度为j的a的子串与b的长度为i的前缀匹配，则继续比较$a_{i+1}$与$b_{a+1}$是否相等，若相等则扩展子串长度，若不相等则需要缩小j，继续进行匹配。如何缩小j呢？若一个一个地缩小j，显然效率太低。我们可以发现，当a[i-j~i]与b[1~j]匹配时，若有b[1~k]与b[i-k~i]匹配，且有b[k-l~k]与b[1~l]匹配，则有b[i-l~l]与b[1~l]匹配。因为若b[1~k]与b[i-k~i]匹配

SCU 4438 Censor|KMP变形题

阅读更多关于 SCU 4438 Censor|KMP变形题

传送门 Censor frog is now a editor to censor so-called sensitive words (敏感词). She has a long text P . Her job is relatively simple -- just to find the first occurence of sensitive word w and remove it. frog repeats over and over again. Help her do the tedious work. Input The input consists of multiple tests. For each test: The first line contains 1 string w . The second line contains 1 string p . (1≤length of w,p≤5⋅10^6 , w , p consists of only lowercase letter) Output For each test, write 1 string which denotes the censored text. Sample Input abc aaabcbc b bbb abc ab Sample Output a ab 题意：

2019暑训8月17号 KMP与拓展KMP、Trie树、AC自动机

阅读更多关于 2019暑训8月17号 KMP与拓展KMP、Trie树、AC自动机

拓展kmp 作用：求母串s的子串s[i; |s|]与模式串t的最长公共前缀，其中i = 0, 1, …, |s|。代码如下： void get_next(char t[maxn]){ int d = (int)strlen(t); nex[0] = d; int i = 0; while(i + 1 < d && t[i] == t[i + 1]) ++i; nex[1] = i; int p = 1; for(int i = 2; i < d; i++){ if(nex[i - p] + i < nex[p] + p) nex[i] = nex[i - p]; else{ int j = p + nex[p] - i; if(j < 0) j = 0; while(i + j < d && t[i + j] == t[j]) j++; nex[i] = j; p = i; } } } void exkmp(char s[maxn], char t[maxn]){ int i = 0; int d1 = (int)strlen(s), d2 = (int)strlen(t); while(i < d1 && i < d2 && s[i] == t[i]) i++; ext[0] = i; int p = 0; for(int i = 1; i < d1; i++){ if(nex

Luogu-P3375 【模板】KMP字符串匹配

阅读更多关于 Luogu-P3375 【模板】KMP字符串匹配

题目题目链接测试得分：　　 100 主要算法：　　字符串KMP 题干：　　KMP板子题　代码 #include<stdio.h> #include<stdlib.h> #include<string.h> #define FORa(i,s,e) for(int i=s;i<=e;i++) #define FORs(i,s,e) for(int i=s;i>=e;i--) #define gc pa==pb&&(pb=(pa=buf)+fread(buf,1,100000,stdin),pa==pb)?EOF:*pa++ #define File(name) freopen(name".in","r",stdin);freopen(name".out","w",stdout); using namespace std; static char buf[100000],*pa=buf,*pb=buf; inline int read(); const int MAXN=1000000; char sta[MAXN+1],stb[MAXN+1]; int lena,lenb,next[MAXN+1]; int main() { scanf("%s%s",sta+1,stb+1); lena=strlen(sta+1),lenb=strlen(stb+1); int j=0;

C - 剪花布条 (KMP例题)

阅读更多关于 C - 剪花布条 (KMP例题)

一块花布条，里面有些图案，另有一块直接可用的小饰条，里面也有一些图案。对于给定的花布条和小饰条，计算一下能从花布条中尽可能剪出几块小饰条来呢？ Input输入中含有一些数据，分别是成对出现的花布条和小饰条，其布条都是用可见ASCII字符表示的，可见的ASCII字符有多少个，布条的花纹也有多少种花样。花纹条和小饰条不会超过1000个字符长。如果遇见#字符，则不再进行工作。 Output输出能从花纹布中剪出的最多小饰条个数，如果一块都没有，那就老老实实输出0，每个结果之间应换行。 Sample Input abcde a3 aaaaaa aa # Sample Output 0 3用kmp求出前缀数组，然后判断有多少个匹配的字符串 #include<iostream> #include<cstring> #include<string > #include<cstdio> #include<vector> using namespace std; const int N=1e5+7; int nxt[N]; int main(){ string a,b; while(cin>>a){ memset(nxt,0,sizeof(nxt)); if(a=="#") break; cin>>b; //字符串的拼接 //---------------------------- int a1=a

cf Text Editor（kmp + 二分）

阅读更多关于 cf Text Editor（kmp + 二分）

http://codeforces.com/gym/101466/problem/E E. Text Editor time limit per test 1.0 s memory limit per test 512 MB input standard input output standard output One of the most useful tools nowadays are text editors, their use is so important that the Unique Natural Advanced Language (UNAL) organization has studied many of the benefits working with them. They are interested specifically in the feature "find", that option looks when a pattern occurs in a text, furthermore, it counts the number of times the pattern occurs in a text. The tool is so well designed that while writing each character of

订阅 kmp