Java indexOf function more efficient than Rabin-Karp? Search Efficiency of Text

后端未结

关注

 7  519

I posed a question to Stackoverflow a few weeks ago about a creating an efficient algorithm to search for a pattern in a large chunk of text. Right now I am using the St

相关标签:

7条回答

终归单人心

2020-12-20 12:47

From my understanding, Rabin Karp is best used when searching a block of text for mutiple words/phrases.

Think about a bad word search, for flagging abusive language.

If you have a list of 2000 words, including derivations, then you would need to call indexOf 2000 times, one for each word you are trying to find.

RabinKarp helps with this by doing the search the other way around. Make a 4 character hash of each of the 2000 words, and put that into a dictionary with a fast lookup.

Now, for each 4 characters of the search text, hash and check against the dictionary.

As you can see, the search is now the other way around - we're searching the 2000 words for a possible match instead. Then we get the string from the dictionary and do an equals to check to be sure.

It's also a faster search this way, because we're searching a dictionary instead of string matching.

Now, imagine the WORST case scenario of doing all those indexOf searches - the very LAST word we check is a match ...

The wikipedia article for RabinKarp even mentions is inferiority in the situation you describe. ;-) http://en.wikipedia.org/wiki/Rabin-Karp_algorithm

0 讨论(0)
发布评论:

提交评论
- 加载中...

上一页 1 2