I needed to compute the Unigrams, BiGrams and Trigrams for a text file containing text like:
\"Cystic fibrosis affects 30,000 children and young adults in the US a
Using collections.deque
:
from collections import deque
from itertools import islice
def ngrams(message, n=1):
it = iter(message.split())
window = deque(islice(it, n), maxlen=n)
yield tuple(window)
for item in it:
window.append(item)
yield tuple(window)
...or maybe you could do it in one line as a list comprehension:
n = 2
message = "Hello, how are you?".split()
myNgrams = [message[i:i+n] for i in range(len(message) - n)]