Quick implementation of character n-grams for word

前端未结

关注

 3  1535

花落未央 2020-12-01 12:13

I wrote the following code for computing character bigrams and the output is right below. My question is, how do I get an output that excludes the last character (ie t)? and

3条回答

时光说笑 (楼主)

2020-12-01 13:14

Ths fucntion gives you ngrams for n = 1 to n:

def getNgrams(sentences, n):
    ngrams = []
    for sentence in sentences:
        _ngrams = []
        for _n in range(1,n+1):
            for pos in range(1,len(sentence)-_n):
                _ngrams.append([sentence[pos:pos+_n]])
        ngrams.append(_ngrams)
    return ngrams

0 讨论(0)

查看其它3个回答