Separating column using separate (tidyr) via dplyr on a first encountered digit

后端 未结 2 561
北恋
北恋 2021-01-11 10:51

I\'m trying to separate a rather messy column into two columns containing period and description. My data resembles the extract below:

set.         


        
2条回答
  •  梦毁少年i
    2021-01-11 11:37

    I think this might do it.

    library(tidyr)
    separate(dta, indicator, c("indicator", "period"), "(?<=[a-z]) ?(?=[0-9])")
    #           indicator   period    values
    # 1     someindicator     2001 0.2655087
    # 2     someindicator     2011 0.3721239
    # 3         some text 20022008 0.5728534
    # 4 another indicator     2003 0.9082078
    

    The following is an explanation of the regular expression, brought to you by regex101.

    • (?<=[a-z]) is a positive lookbehind - it asserts that [a-z] (match a single character present in the range between a and z (case sensitive)) can be matched
    • ? matches the space character in front of it literally, between zero and one time, as many times as possible, giving back as needed
    • (?=[0-9]) is a positive lookahead - it asserts that [0-9] (match a single character present in the range between 0 and 9) can be matched

提交回复
热议问题