Computing N Grams using Python

后端未结

关注

 8  1829

情歌与酒 2020-11-28 06:02

I needed to compute the Unigrams, BiGrams and Trigrams for a text file containing text like:

\"Cystic fibrosis affects 30,000 children and young adults in the US a

8条回答

北海茫月 (楼主)

2020-11-28 06:33

Using collections.deque:

from collections import deque
from itertools import islice

def ngrams(message, n=1):
    it = iter(message.split())
    window = deque(islice(it, n), maxlen=n)
    yield tuple(window)
    for item in it:
        window.append(item)
        yield tuple(window)

...or maybe you could do it in one line as a list comprehension:

n = 2
message = "Hello, how are you?".split()
myNgrams = [message[i:i+n] for i in range(len(message) - n)]

0 讨论(0)

查看其它8个回答