Are there any working implementations of the rolling hash function used in the Rabin-Karp string search algorithm?

▼魔方 西西 提交于 2019-12-03 06:28:54

i remember a slightly different implementation which seems to be from one of sedgewick's algorithms books (it also contains example code - try to look it up). here's a summary adjusted to 32 bit integers:

you use modulo arithmetic to prevent your integer from overflowing after each operation.

initially set:

  • c = text ("stackoverflow")
  • M = length of the "n-grams"
  • d = size of your alphabet (256)
  • q = a large prime so that (d+1)*q doesn't overflow (8355967 might be a good choice)
  • dM = dM-1 mod q

first calculate the hash value of the first n-gram:

h = 0
for i from 1 to M:
  h = (h*d + c[i]) mod q

and for every following n-gram:

for i from 1 to lenght(c)-M:
  // first subtract the oldest character
  h = (h + d*q - c[i]*dM) mod q

  // then add the next character
  h = (h*d + c[i+M]) mod q

the reason why you have to add d*q before subtracting the oldest character is because you might run into negative values due to small values caused by the previous modulo operation.

errors included but i think you should get the idea. try to find one of sedgewick's algorithms books for details, less errors and a better description. :)

As i understand it's a function minimization for:

2^31 - sum (maxchar) * A^kx

where maxchar = 62 (for A-Za-z0-9). I've just calculated it by Excel (OO Calc, exactly) :) and a max A it found is 76, or 73, for a prime number.

Not sure what your aim is here, but if you are trying to improve performance, using math.pow will cost you far more than you save by calculating a rolling hash value.

I suggest you start by keeping to simple and efficient and you are very likely find it is fast enough.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!