repeated phrases in the text Python

后端 未结 4 712
借酒劲吻你
借酒劲吻你 2021-01-21 02:22

I have a problem and I have no idea how to solve it. Please, give a piece of advice.

I have a text. Big, big text. The task is to find all the repeated phrases which len

4条回答
  •  遇见更好的自我
    2021-01-21 02:47

    Here's a roughly O(n) solution, which should work on pretty large input texts. If it's too slow, you probably want to look into using Perl which was designed for text processing or C++ for pure performance.

    >>> s = 'The quick brown fox jumps over the lazy dog'
    >>> words = string.lower(s).split()
    >>> phrases = collections.defaultdict(int)
    >>> for a, b, c in zip(words[:-3], words[1:-2], words[2:]):
    ...     phrases[(a, b, c)] += 1
    ... 
    >>> phrases
    defaultdict(, {('over', 'the', 'lazy'): 1, ('quick', 'brown', 'fox'): 1, ('the', '
    quick', 'brown'): 1, ('jumps', 'over', 'the'): 1, ('brown', 'fox', 'jumps'): 1, ('fox', 'jumps
    ', 'over'): 1})
    >>> [phrase for phrase, count in phrases.iteritems() if count > 1]
    >>> []
    

提交回复
热议问题