Computing N Grams using Python

后端 未结 8 1876
情歌与酒
情歌与酒 2020-11-28 06:02

I needed to compute the Unigrams, BiGrams and Trigrams for a text file containing text like:

\"Cystic fibrosis affects 30,000 children and young adults in the US a

8条回答
  •  醉话见心
    2020-11-28 06:26

    Use NLTK (the Natural Language Toolkit) and use the functions to tokenize (split) your text into a list and then find bigrams and trigrams.

    import nltk
    words = nltk.word_tokenize(my_text)
    my_bigrams = nltk.bigrams(words)
    my_trigrams = nltk.trigrams(words)
    

提交回复
热议问题