Computing N Grams using Python

后端 未结 8 1821
情歌与酒
情歌与酒 2020-11-28 06:02

I needed to compute the Unigrams, BiGrams and Trigrams for a text file containing text like:

\"Cystic fibrosis affects 30,000 children and young adults in the US a

8条回答
  •  自闭症患者
    2020-11-28 06:33

    nltk has native support for ngrams

    'n' is the ngram size ex: n=3 is for a trigram

    from nltk import ngrams
    
    def ngramize(texts, n):
        output=[]
        for text in texts:
            output += ngrams(text,n)
        return output
    

提交回复
热议问题