How to extract character ngram from sentences? - python

橙三吉。 提交于 2019-12-03 15:45:40

Why not just (?=(...))

edit Same thing, but not whitespace (?=(\S\S\S))
edit2 You can use just what you want as well. Ex. uses alphanum only (?=([^\W_]{3}))

Uses a lookahead to capture 3 characters. Then the engine bumps the position up 1 time each
match. Then captures next 3.

Result of foobar is
foo
oob
oba
bar

 # Compressed regex
 #  (?=(...))

 # Expanded regex
 (?=                   # Start Lookahead assertion
      (                     # Capture group 1 start
           .                     # dot - metachar, matches any character except newline
           .                     # dot - metachar
           .                     # dot - metachar
      )                     # Capture group 1 end
 )                     # End Lookahead assertion
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!