POSIX character class does not work in base R regex

余生长醉 提交于 2019-11-27 05:38:06

Although stringr ICU regex engines supports bare POSIX character classes in the pattern, in base R regex flavors (both PCRE (perl=TRUE) and TRE), POSIX character classes must be inside bracket expressions. [:alnum:] -> [[:alnum:]].

x <- c("AZaz09 y AZaz09", "ĄŻaz09 y AZŁł09", "26 de Marzo y Pareyra de la Luz")
grepl("[[:alnum:][:blank:]]+[[:blank:]][yY][[:blank:]][[:alnum:][:blank:]]+", x)
## => [1] TRUE TRUE TRUE
grepl("[[:alnum:][:blank:]]+[[:blank:]][yY][[:blank:]][[:alnum:][:blank:]]+", x, perl=TRUE)
## => [1] TRUE TRUE TRUE

See the online demo

When you use [:alnum:] alone, it is a simple bracket expression that matches a single character, a :, a, l, n, u, m.

Pattern details:

  • [[:alnum:][:blank:]]+ - 1+ alphanumeric or horizontal whitespace symbols
  • [[:blank:]] - 1 horizontal whitespace symbols
  • [yY] - either y or Y
  • [[:blank:]] - 1 horizontal whitespace symbols
  • [[:alnum:][:blank:]]+ - 1+ alphanumeric or horizontal whitespace symbols
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!