Given an arbitrary string, what is an efficient method of finding duplicate phrases? We can say that phrases must be longer than a certain length to be included.
Id
Like the earlier folks mention that suffix tree is the best tool for the job. My favorite site for suffix trees is http://www.allisons.org/ll/AlgDS/Tree/Suffix/. It enumerates all the nifty uses of suffix trees on one page and has a test js applicaton embedded to test strings and work through examples.