There are many possible regular expressions to do this. Here is one:
x=c("East Kootenay C (5901035) RDA 01011","Thompson-Nicola J (Copper Desert Country) (5933039) RDA 02020")
> gsub('.+\\(([0-9]+)\\).+?$', '\\1', x)
[1] "5901035" "5933039"
Lets break down the syntax of that first expression '.+\\(([0-9]+)\\).+'
.+
one or more of anything
\\(
parentheses are special characters in a regular expression, so if I want to represent the actual thing (
I need to escape it with a \
. I have to escape it again for R (hence the two \
s).
([0-9]+)
I mentioned special characters, here I use two. the first is the parentheses which indicate a group I want to keep. The second [
and ]
surround groups of things. see ?regex
for more information.
?$
The final piece assures that I am grabbing the LAST set of numbers in parens as noted in the comments.
I could also use *
instead of .
which would mean 0 or more rather than one or more i in case your paren string comes at the beginning or end of a string.
The second piece of the gsub
is what I am replacing the first portion with. I used: \\1
. This says use group 1 (the stuff inside the ( )
from above. I need to escape it twice again, once for the regex and once for R.
Clear as mud to be sure! Enjoy your data munging project!