问题
I am "converting" from data.frame to data.table
I now have a data.table:
library(data.table)
DT = data.table(ID = c("ab_cd.de","ab_ci.de","fb_cd.de","xy_cd.de"))
DT
ID
1: ab_cd.de
2: ab_ci.de
3: fb_cd.de
4: xy_cd.de
new_DT<- data.table(matrix(ncol = 2))
colnames(new_DT)<- c("test1", "test2")
I would like to to first: delete ".de" after every entry and in the next step separate every entry by the underscore and save the output in two new columns. The final output should look like this:
test1 test2
1 ab cd
2 ab ci
3 fb cd
4 xy cd
In data.frame I did:
df = data.frame(ID = c("ab_cd.de","ab_ci.de","fb_cd.de","xy_cd.de"))
df
ID
1: ab_cd.de
2: ab_ci.de
3: fb_cd.de
4: xy_cd.de
df[,1] <- gsub(".de", "", df[,1], fixed=FALSE)
df
ID
1: ab_cd
2: ab_ci
3: fb_cd
4: xy_cd
n <- 1
for (i in (1:length(df[,1]))){
new_df[n,] <-str_split_fixed(df[i,1], "_", 2)
n <- n+1
}
new_df
test1 test2
1 ab cd
2 ab ci
3 fb cd
4 xy cd
Any help is appreciated!
回答1:
You can use tstrsplit
to split the column into two after removing the suffix (.de) with sub
:
DT[, c("test1", "test2") := tstrsplit(sub("\\.de", "", ID), "_")][, ID := NULL][]
# test1 test2
#1: ab cd
#2: ab ci
#3: fb cd
#4: xy cd
回答2:
We can use extract
from tidyr
library(tidyr)
df %>%
extract(ID, into = c('test1', 'test2'), '([^_]+)_([^.]+).*')
# test1 test2
#1 ab cd
#2 ab ci
#3 fb cd
#4 xy cd
Or using data.table
library(data.table)
DT[, .(test1 = sub('_.*', '', ID), test2 = sub('[^_]+_([^.]+)\\..*', '\\1', ID))]
# test1 test2
#1: ab cd
#2: ab ci
#3: fb cd
#4: xy cd
来源:https://stackoverflow.com/questions/44217340/r-gsub-and-str-split-fixed-in-data-tables