R: split string into numeric and return the mean as a new column in a data frame

后端 未结 3 1563
予麋鹿
予麋鹿 2020-12-11 08:38

I have a large data frame with columns that are a character string of numbers such as \"1, 2, 3, 4\". I wish to add a new column that is the average of these numbers. I have

相关标签:
3条回答
  • 2020-12-11 09:15

    Try:

    library(dplyr)
    library(splitstackshape)
    
    df %>%
      mutate(index = row_number()) %>%
      cSplit("a", direction = "long") %>%
      group_by(index) %>%
      summarise(mean = mean(a))
    

    Which gives:

    #Source: local data table [3 x 2]
    #
    #  index mean
    #1     1  2.5
    #2     2  5.0
    #3     3  7.5
    

    Or as per @Ananda's suggestion:

    > rowMeans(cSplit(df, "a"), na.rm = T)
    # [1] 2.5 5.0 7.5
    

    If you want to keep the result in a data frame you could do:

    df %>% mutate(mean = rowMeans(cSplit(., "a"), na.rm = T))
    

    Which gives:

    #            a mean
    #1  1, 2, 3, 4  2.5
    #2  2, 4, 6, 8  5.0
    #3 3, 6, 9, 12  7.5
    
    0 讨论(0)
  • 2020-12-11 09:16

    You could use sapply to loop through the list returned by strsplit, handling each of the list elements:

    sapply(strsplit((df$a), split=", "), function(x) mean(as.numeric(x)))
    # [1] 2.5 5.0 7.5
    
    0 讨论(0)
  • 2020-12-11 09:32
    library(data.table)
    cols <- paste0("a",1:4)
    setDT(df)[, (cols) := tstrsplit(a, ",", fixed=TRUE, type.convert=TRUE)
            ][, .(Mean = rowMeans(.SD)), .SDcols = cols]
       Mean
    1:  2.5
    2:  5.0
    3:  7.5
    

    Alternatively,

    rowMeans(setDT(tstrsplit(df$a, ",", fixed=TRUE, type.convert=TRUE)))
    # [1] 2.5 5.0 7.5
    
    0 讨论(0)
提交回复
热议问题