Quick implementation of character n-grams for word

前端 未结 3 1529
花落未央
花落未央 2020-12-01 12:13

I wrote the following code for computing character bigrams and the output is right below. My question is, how do I get an output that excludes the last character (ie t)? and

3条回答
  •  一向
    一向 (楼主)
    2020-12-01 13:03

    To generate bigrams:

    In [8]: b='student'
    
    In [9]: [b[i:i+2] for i in range(len(b)-1)]
    Out[9]: ['st', 'tu', 'ud', 'de', 'en', 'nt']
    

    To generalize to a different n:

    In [10]: n=4
    
    In [11]: [b[i:i+n] for i in range(len(b)-n+1)]
    Out[11]: ['stud', 'tude', 'uden', 'dent']
    

提交回复
热议问题