I needed to compute the Unigrams, BiGrams and Trigrams for a text file containing text like:
\"Cystic fibrosis affects 30,000 children and young adults in the US a
Use NLTK (the Natural Language Toolkit) and use the functions to tokenize (split) your text into a list and then find bigrams and trigrams.
import nltk words = nltk.word_tokenize(my_text) my_bigrams = nltk.bigrams(words) my_trigrams = nltk.trigrams(words)