Computing N Grams using Python

后端 未结 8 1823
情歌与酒
情歌与酒 2020-11-28 06:02

I needed to compute the Unigrams, BiGrams and Trigrams for a text file containing text like:

\"Cystic fibrosis affects 30,000 children and young adults in the US a

8条回答
  •  醉梦人生
    2020-11-28 06:16

    A short Pythonesque solution from this blog:

    def find_ngrams(input_list, n):
      return zip(*[input_list[i:] for i in range(n)])
    

    Usage:

    >>> input_list = ['all', 'this', 'happened', 'more', 'or', 'less']
    >>> find_ngrams(input_list, 1)
    [('all',), ('this',), ('happened',), ('more',), ('or',), ('less',)]
    >>> find_ngrams(input_list, 2)
    [('all', 'this'), ('this', 'happened'), ('happened', 'more'), ('more', 'or'), ('or', 'less')]
    >>> find_ngrams(input_list, 3))
    [('all', 'this', 'happened'), ('this', 'happened', 'more'), ('happened', 'more', 'or'), ('more', 'or', 'less')]
    

提交回复
热议问题