Split string based on alternating character in R

后端 未结 9 590
醉话见心
醉话见心 2021-01-30 10:02

I\'m trying to figure out an efficient way to go about splitting a string like

\"111110000011110000111000\"

into a vector

[1] \         


        
9条回答
  •  谎友^
    谎友^ (楼主)
    2021-01-30 10:38

    Another way would be to add whitespace between the alternating digits. This would work for any two, not just 1s and 0s. Then use strsplit on the whitespace:

    x <- "111110000011110000111000"
    
    (y <- gsub('(\\d)(?!\\1)', '\\1 \\2', x, perl = TRUE))
    # [1] "11111 00000 1111 0000 111 000 "
    
    
    strsplit(y, ' ')[[1]]
    # [1] "11111" "00000" "1111"  "0000"  "111"   "000"  
    

    Or more succinctly as @akrun points out:

    strsplit(x, '(?<=(\\d))(?!\\1)', perl=TRUE)[[1]]
    # [1] "11111" "00000" "1111"  "0000"  "111"   "000"  
    

    also changing \\d to \\w works also

    x  <- "aaaaabbcccccccbbbad"
    strsplit(x, '(?<=(\\w))(?!\\1)', perl=TRUE)[[1]]
    # [1] "aaaaa"   "bb"      "ccccccc" "bbb"     "a"       "d"      
    
    x <- "111110000011110000111000"
    strsplit(x, '(?<=(\\w))(?!\\1)', perl=TRUE)[[1]]
    # [1] "11111" "00000" "1111"  "0000"  "111"   "000" 
    

    You could also use \K (rather than explicitly using the capture groups, \\1 and \\2) which I don't see used a lot nor do I know how to explain it :}

    AFAIK \\K resets the starting point of the reported match and any previously consumed characters are no longer included, basically throwing away everything matched up to that point.

    x <- "1111100000222000333300011110000111000"
    (z <- gsub('(\\d)\\K(?!\\1)', ' ', x, perl = TRUE))
    # [1] "11111 00000 222 000 3333 000 1111 0000 111 000 "
    

提交回复
热议问题