Quick implementation of character n-grams for word

前端 未结 3 1535
花落未央
花落未央 2020-12-01 12:13

I wrote the following code for computing character bigrams and the output is right below. My question is, how do I get an output that excludes the last character (ie t)? and

3条回答
  •  时光说笑
    2020-12-01 13:14

    Ths fucntion gives you ngrams for n = 1 to n:

    def getNgrams(sentences, n):
        ngrams = []
        for sentence in sentences:
            _ngrams = []
            for _n in range(1,n+1):
                for pos in range(1,len(sentence)-_n):
                    _ngrams.append([sentence[pos:pos+_n]])
            ngrams.append(_ngrams)
        return ngrams
    

提交回复
热议问题