R Regular Expression Lookbehind

后端 未结 3 911
夕颜
夕颜 2020-12-11 05:34

I have a vector filled with strings of the following format:

the first entries of the vector looks like

相关标签:
3条回答
  • 2020-12-11 06:23

    You will need to use gregexpr from the base package. This works:

    > s <- "199719982001"
    > gregexpr("^\\d{4}|\\d{1}(?<=\\d{3}$)",s,perl=TRUE)
    [[1]]
    [1]  1 12
    attr(,"match.length")
    [1] 4 1
    attr(,"useBytes")
    [1] TRUE
    

    Note the perl=TRUE setting. For more details look into ?regex.

    Judging from the output your regular expression does not catch id1 though.

    0 讨论(0)
  • 2020-12-11 06:27

    Since this is fixed format, why not use substr? year1 is extracted using substr(s,1,4), id1 is extracted using substr(s,9,9) and the id2 as as.numeric(substr(s,10,13)). In the last case I used as.numeric to get rid of the zeroes.

    0 讨论(0)
  • 2020-12-11 06:27

    You can use sub.

    sub("^(.{4}).{4}(.{1}).*([1-9]{1,3})$","\\1\\2\\3",s)
    
    0 讨论(0)
提交回复
热议问题