Split string based on alternating character in R

后端 未结 9 717
醉话见心
醉话见心 2021-01-30 10:02

I\'m trying to figure out an efficient way to go about splitting a string like

\"111110000011110000111000\"

into a vector

[1] \         


        
9条回答
  •  耶瑟儿~
    2021-01-30 10:35

    Simple for loop solution

    x="aaaaabbcccccccbbbad1111100000222aaabbccd11DaaBB"
    res_vector=substr(x,1,1)
    
    for (i in 2:nchar(x)) {
      tmp=substr(x,i,i)
      if (tmp==substr(x,i-1,i-1)) {
        res_vector[length(res_vector)]=paste0(res_vector[length(res_vector)],tmp)
      } else {
        res_vector[length(res_vector)+1]=tmp
      }
    }
    
    res_vector
    
    #[1] "aaaaa"  "bb"  "ccccccc"  "bbb"  "a"  "d"  "11111"  "00000"  "222"  "aaa"  "bb"  "cc"  "d"  "11"  "D"  "aa"  "BB"
    

    Or a maybe a little bit faster with a pre-allocated results vector

    x="aaaaabbcccccccbbbad1111100000222aaabbccd11DaaBB"
    res_vector=rep(NA_character_,nchar(x))
    res_vector[1]=substr(x,1,1)
    counter=1
    old_tmp=''
    
    for (i in 2:nchar(x)) {
      tmp=substr(x,i,i)
      if (tmp==old_tmp) {
        res_vector[counter]=paste0(res_vector[counter],tmp)
      } else {
        res_vector[counter+1]=tmp
        counter=counter+1
      }
      old_tmp=tmp
    }
    
    res_vector[!is.na(res_vector)]
    

提交回复
热议问题