I needed to compute the Unigrams, BiGrams and Trigrams for a text file containing text like:
\"Cystic fibrosis affects 30,000 children and young adults in the US a
nltk has native support for ngrams
'n' is the ngram size ex: n=3 is for a trigram
from nltk import ngrams def ngramize(texts, n): output=[] for text in texts: output += ngrams(text,n) return output