Extract the sub string matching regex

醉酒当歌 提交于 2021-01-27 22:03:25

问题


I am trying to extract 22 chocolates from the following string:

   SOMETEXT for 2 FFXX. Another 22 chocolates & 45 chamkila.

using regex \\d+\\s*(chocolates.|chocolate.). I used :

grep("\\d+\\s*(chocolates.|chocolate.)",s)

but it does not give the string 22 chocolates. How could I extract the part that is matching the regex?


回答1:


Here is an option using sub from base R:

x <- "SOMETEXT for 2 FFXX. Another 22 chocolates & 45 chamkila."
sub(".*?(\\d+ chocolates?).*", "\\1", x)

22 chocolates

The pattern in parentheses, (\\d+ chocolates?), is a capture group, and is available as \\1 after sub has run on the match.

Demo

Edit:

As you have seen, if sub cannot find an exact match, it will return the input string. This behavior often makes sense, because in a case where a substitution does not make sense, you would want the input to not be changed.

If you need to find out whether or not the pattern matches, then calling grep is one option:

grep(".*(\\d+ chocolates?).*",x,value = FALSE)



回答2:


Your original pattern does not return 22 chocolates because it is a pattern that should be used in a matching function, while grep only returns whole items in a character vector that contain the match anywhere inside.

Also, note that (chocolates.|chocolate.) alternation group can be shortened to chocolates?. since the only difference is the plural case for chocolate and it can easily be achieved with a ? quantifier (=1 or 0 occurrences).

A matching function example can be with stringr::str_extract (str_extract_all to match all occurrences):

> library(stringr)
> x <- " SOMETEXT for 2 FFXX. Another 22 chocolates & 45 chamkila."
> p <- "\\d+\\s*chocolates?"
> str_extract(x, p)
[1] "22 chocolates"

Or a base R regmatches/regexpr (or gregexpr to extract multiple occurrences) approach:

> library(stringr)
> x <- " SOMETEXT for 2 FFXX. Another 22 chocolates & 45 chamkila."
> p <- "\\d+\\s*chocolates?"
> regmatches(x, regexpr(p, x))
[1] "22 chocolates"


来源:https://stackoverflow.com/questions/48961838/extract-the-sub-string-matching-regex

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!