I\'m trying to figure out an efficient way to go about splitting a string like
\"111110000011110000111000\"
into a vector
[1] \
Another way would be to add whitespace between the alternating digits. This would work for any two, not just 1s and 0s. Then use strsplit
on the whitespace:
x <- "111110000011110000111000"
(y <- gsub('(\\d)(?!\\1)', '\\1 \\2', x, perl = TRUE))
# [1] "11111 00000 1111 0000 111 000 "
strsplit(y, ' ')[[1]]
# [1] "11111" "00000" "1111" "0000" "111" "000"
Or more succinctly as @akrun points out:
strsplit(x, '(?<=(\\d))(?!\\1)', perl=TRUE)[[1]]
# [1] "11111" "00000" "1111" "0000" "111" "000"
also changing \\d
to \\w
works also
x <- "aaaaabbcccccccbbbad"
strsplit(x, '(?<=(\\w))(?!\\1)', perl=TRUE)[[1]]
# [1] "aaaaa" "bb" "ccccccc" "bbb" "a" "d"
x <- "111110000011110000111000"
strsplit(x, '(?<=(\\w))(?!\\1)', perl=TRUE)[[1]]
# [1] "11111" "00000" "1111" "0000" "111" "000"
You could also use \K
(rather than explicitly using the capture groups, \\1
and \\2
) which I don't see used a lot nor do I know how to explain it :}
AFAIK \\K
resets the starting point of the reported match and any previously consumed characters are no longer included, basically throwing away everything matched up to that point.
x <- "1111100000222000333300011110000111000"
(z <- gsub('(\\d)\\K(?!\\1)', ' ', x, perl = TRUE))
# [1] "11111 00000 222 000 3333 000 1111 0000 111 000 "