How to split a dataframe column by the first instance of a character in its values

问题

I have a dataframe (or vector?)

x <- data.frame(a=c("A_B_D", "B_C"))

I want to split the vector in x$a into two new columns by the first instance of "_" to get

x$b 
[1] "A" "B_D"

and

x$c
[2] "B" "C"

i tried variants of gsub, but couldnt come to a solution.

回答1:

Another option might be to use tidyr::separate:

separate(x,a,into = c("b","c"),sep = "_",remove = FALSE,extra = "merge")

回答2:

One idea is to replace the first _ with another delimiter and split on the new delimiter. This works because using sub will only replace the first found delimiter (whereas gsub replaces all), i.e.

strsplit(sub('_', ',', x$a), ',', fixed = TRUE)
#[[1]]
#[1] "A"   "B_D"

#[[2]]
#[1] "B" "C"

To create two new columns in your original data frame,

within(x, new <- data.frame(do.call(rbind, strsplit(sub('_', ',', x$a), ',', fixed = TRUE))))
#      a new.X1 new.X2
#1 A_B_D      A    B_D
#2   B_C      B      C

来源：https://stackoverflow.com/questions/55748363/how-to-split-a-dataframe-column-by-the-first-instance-of-a-character-in-its-valu

标签

regex

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!