Computing N Grams using Python

后端 未结 8 1829
情歌与酒
情歌与酒 2020-11-28 06:02

I needed to compute the Unigrams, BiGrams and Trigrams for a text file containing text like:

\"Cystic fibrosis affects 30,000 children and young adults in the US a

8条回答
  •  北海茫月
    2020-11-28 06:33

    Using collections.deque:

    from collections import deque
    from itertools import islice
    
    def ngrams(message, n=1):
        it = iter(message.split())
        window = deque(islice(it, n), maxlen=n)
        yield tuple(window)
        for item in it:
            window.append(item)
            yield tuple(window)
    

    ...or maybe you could do it in one line as a list comprehension:

    n = 2
    message = "Hello, how are you?".split()
    myNgrams = [message[i:i+n] for i in range(len(message) - n)]
    

提交回复
热议问题