I looking for a way to efficiently apply a function to each row of data.table. Let\'s consider the following data table:
library(data.table)
library(stringr)
How about :
x
a b
1: 1 12 13
2: 2 14 15
3: 3 16 17
4: 1 18 19
x[,list(a=rep(a,each=2), V1=unlist(strsplit(b," ")))]
a V1
1: 1 12
2: 1 13
3: 2 14
4: 2 15
5: 3 16
6: 3 17
7: 1 18
8: 1 19
Generalized solution given comment :
x[,{s=strsplit(b," ");list(a=rep(a,sapply(s,length)), V1=unlist(s))}]
The dplyr
/tidyr
approach also works with data tables.
library(dplyr)
library(tidyr)
x %>%
separate(b, into = c("b1", "b2")) %>%
gather(b, "V1", b1:b2) %>%
arrange(V1) %>%
select(a, V1)
Or, using the standard evaluation forms:
x %>%
separate_("b", into = c("b1", "b2")) %>%
gather_("b", "V1", c("b1", "b2")) %>%
arrange_(~ V1) %>%
select_(~ a, ~ V1)
The case of different numbers of values in the b
column is only slightly more complicated.
library(stringr)
x2 <- data.table(
a = c(1:3, 1),
b = c('12 13', '14', '15 16 17', '18 19')
)
n <- max(str_count(x2$b, " ")) + 1
b_cols <- paste0("b", seq_len(n))
x2 %>%
separate_("b", into = b_cols, extra = "drop") %>%
gather_("b", "V1", b_cols) %>%
arrange_(~ V1) %>%
select_(~ a, ~ V1)
Looking at input and desired output, this should work -
x <- data.frame(a=c(1,2,3,1),b=c("12 13","14 15","16 17","18 19"))
data.frame(a=rep(x$a,each=2), new_b=unlist(strsplit(as.character(x$b)," ")))
x[, .(a,strsplit(b,' ')), by = .I]
looks more estetic
The most effective and idiomatic approach is to have a vectorized function.
In this case, some kind of regex
will do what you want
x[, V1 := gsub(" [[:alnum:]]*", "", b)]
a b V1
1: 1 12 13 12
2: 2 14 15 14
3: 3 16 17 16
4: 1 18 19 18
If you want to return the each split component, and you know there are two in each one, you can use Map
to coerce the result of strsplit
into the correct form
x[, c('b1','b2') := do.call(Map, c(f = c, strsplit(b, ' ')))]
x
a b b1 b2
1: 1 12 13 12 13
2: 2 14 15 14 15
3: 3 16 17 16 17
4: 1 18 19 18 19
x[, .(a,strsplit(b,' ')), by=1:nrow(x)]
by=nrow(x)
is a simple way to force 1 row per by-group