I have a vector filled with strings of the following format:
the first entries of the vector looks like
You will need to use gregexpr from the base package. This works:
> s <- "199719982001"
> gregexpr("^\\d{4}|\\d{1}(?<=\\d{3}$)",s,perl=TRUE)
[[1]]
[1] 1 12
attr(,"match.length")
[1] 4 1
attr(,"useBytes")
[1] TRUE
Note the perl=TRUE setting. For more details look into ?regex.
Judging from the output your regular expression does not catch id1 though.
Since this is fixed format, why not use substr? year1 is extracted using substr(s,1,4), id1 is extracted using substr(s,9,9) and the id2 as as.numeric(substr(s,10,13)). In the last case I used as.numeric to get rid of the zeroes.
You can use sub.
sub("^(.{4}).{4}(.{1}).*([1-9]{1,3})$","\\1\\2\\3",s)