Split character string multiple times every two characters

早过忘川 提交于 2019-12-17 20:37:26

问题


I have a character column in my dataframe that looks like

df<-
  data.frame(a=c("AaBbCC","AABBCC","AAbbCC"))#df
       a
1 AaBbCC
2 AABBCC
3 AAbbCC

I would like to split this column every two characters. So in this case I would like to obtain three columns named VA,VB,VC. I tried

library(tidyr)
library(dplyr)
df<-
  data.frame(a=c("AaBbCC","AABBCC","AAbbCC"))%>%
  separate(a,c(paste("V",LETTERS[1:3],sep="")),sep=c(2,2))
 VA VB   VC
1 Aa    BbCC
2 AA    BBCC
3 AA    bbCC

but this is not the desired result. I like to have the result that is now in VC split into VB (all letter B) and VC (all letter C)How do I get R to split every two characters. The length of the string in the column is always the same for every row (6 in this example). I will have strings that are of length >10.


回答1:


You were actually quite close. You need to specify the separator-positions as sep = c(2,4) instead of sep = c(2,2):

df <- separate(df, a, c(paste0("V",LETTERS[1:3])),sep = c(2,4))

you get:

> df
  VA VB VC
1 Aa Bb CC
2 AA BB CC
3 AA bb CC

In base R you could do (borrowing from @rawr's comment):

l <- ave(as.character(df$a), FUN = function(x) strsplit(x, '(?<=..)', perl = TRUE))
df <- data.frame(do.call('rbind', l))

which gives:

> df
  X1 X2 X3
1 Aa Bb CC
2 AA BB CC
3 AA bb CC



回答2:


We could do this with base R

read.csv(text=gsub('(..)(?!$)', '\\1,', df$a, 
    perl=TRUE),col.names=paste0("V", LETTERS[1:3]), header=FALSE)
#  VA VB VC
#1 Aa Bb CC
#2 AA BB CC
#3 AA bb CC

If we are reading directly from the file, another option is read.fwf

read.fwf(file="yourfile.txt", widths=c(2,2,2), skip=1)


来源:https://stackoverflow.com/questions/34695136/split-character-string-multiple-times-every-two-characters

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!