Applying a function to each row of a data.table

前端 未结 7 1290
日久生厌
日久生厌 2020-12-03 10:46

I looking for a way to efficiently apply a function to each row of data.table. Let\'s consider the following data table:

library(data.table)
library(stringr)         


        
相关标签:
7条回答
  • 2020-12-03 11:04

    How about :

    x
       a     b
    1: 1 12 13
    2: 2 14 15
    3: 3 16 17
    4: 1 18 19
    
    x[,list(a=rep(a,each=2), V1=unlist(strsplit(b," ")))]
       a V1
    1: 1 12
    2: 1 13
    3: 2 14
    4: 2 15
    5: 3 16
    6: 3 17
    7: 1 18
    8: 1 19
    

    Generalized solution given comment :

    x[,{s=strsplit(b," ");list(a=rep(a,sapply(s,length)), V1=unlist(s))}]
    
    0 讨论(0)
  • 2020-12-03 11:04

    The dplyr/tidyr approach also works with data tables.

    library(dplyr)
    library(tidyr)
    x %>% 
      separate(b, into = c("b1", "b2")) %>% 
      gather(b, "V1", b1:b2) %>%
      arrange(V1) %>%
      select(a, V1)
    

    Or, using the standard evaluation forms:

    x %>% 
      separate_("b", into = c("b1", "b2")) %>% 
      gather_("b", "V1", c("b1", "b2")) %>%
      arrange_(~ V1) %>%
      select_(~ a, ~ V1)
    

    The case of different numbers of values in the b column is only slightly more complicated.

    library(stringr)
    
    x2 <- data.table(
      a = c(1:3, 1), 
      b = c('12 13', '14', '15 16 17', '18 19')
    )
    
    n <- max(str_count(x2$b, " ")) + 1
    b_cols <- paste0("b", seq_len(n))
    x2 %>% 
      separate_("b", into = b_cols, extra = "drop") %>% 
      gather_("b", "V1", b_cols) %>%
      arrange_(~ V1) %>%
      select_(~ a, ~ V1)
    
    0 讨论(0)
  • 2020-12-03 11:05

    Looking at input and desired output, this should work -

    x <- data.frame(a=c(1,2,3,1),b=c("12 13","14 15","16 17","18 19"))
    data.frame(a=rep(x$a,each=2), new_b=unlist(strsplit(as.character(x$b)," ")))
    
    0 讨论(0)
  • 2020-12-03 11:08
    x[, .(a,strsplit(b,' ')), by = .I]
    

    looks more estetic

    0 讨论(0)
  • 2020-12-03 11:11

    The most effective and idiomatic approach is to have a vectorized function.

    In this case, some kind of regex will do what you want

     x[, V1 := gsub(" [[:alnum:]]*", "", b)]
    
       a     b V1
    1: 1 12 13 12
    2: 2 14 15 14
    3: 3 16 17 16
    4: 1 18 19 18
    

    If you want to return the each split component, and you know there are two in each one, you can use Map to coerce the result of strsplit into the correct form

    x[, c('b1','b2')  := do.call(Map, c(f = c, strsplit(b, ' ')))]
    
    
    
    x
       a     b b1 b2
    1: 1 12 13 12 13
    2: 2 14 15 14 15
    3: 3 16 17 16 17
    4: 1 18 19 18 19
    
    0 讨论(0)
  • 2020-12-03 11:19
    x[, .(a,strsplit(b,' ')), by=1:nrow(x)]
    

    by=nrow(x) is a simple way to force 1 row per by-group

    0 讨论(0)
提交回复
热议问题