calculate mean for multiple columns in data.frame

后端 未结 3 1094
温柔的废话
温柔的废话 2020-12-16 23:27

Just wondering whether it is possible to calculate means for multiple columns by just using the mean function

e.g.

mean(iris[,1])

i

相关标签:
3条回答
  • 2020-12-16 23:57

    Try colMeans:

    But the column must be numeric. You can add a test for it for larger datasets.

    colMeans(iris[sapply(iris, is.numeric)])
    Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
        5.843333     3.057333     3.758000     1.199333 
    

    Benchmark

    Seems long for dplyr and data.table. Perhaps someone can replicate the findings for veracity.

    microbenchmark(
      plafort = colMeans(big.df[sapply(big.df, is.numeric)]),
      Carlos  = colMeans(Filter(is.numeric, big.df)),
      Cdtable = big.dt[, lapply(.SD, mean)],
      Cdplyr  = big.df %>% summarise_each(funs(mean))
      )
    #Unit: milliseconds
    #    expr       min        lq     mean    median       uq       max
    # plafort  9.862934 10.506778 12.07027 10.699616 11.16404  31.23927
    #  Carlos  9.215143  9.557987 11.30063  9.843197 10.21821  65.21379
    # Cdtable 57.157250 64.866996 78.72452 67.633433 87.52451 264.60453
    #  Cdplyr 62.933293 67.853312 81.77382 71.296555 91.44994 182.36578
    

    Data

    m <- matrix(1:1e6, 1000)
    m2 <- matrix(rep('a', 1000), ncol=1)
    big.df <- as.data.frame(cbind(m2, m), stringsAsFactors=F)
    big.df[,-1] <- lapply(big.df[,-1], as.numeric)
    big.dt <- as.data.table(big.df)
    
    0 讨论(0)
  • 2020-12-16 23:57

    With sapply + Filter:

    sapply(Filter(is.numeric, iris), mean)
    Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
        5.843333     3.057333     3.758000     1.199333 
    

    With dplyr:

    library(dplyr)
    iris %>% summarise_each(funs(mean))
       Sepal.Length Sepal.Width Petal.Length Petal.Width Species
    1:     5.843333    3.057333        3.758    1.199333      NA
    

    PS: in dplyr you can now use summarize_if,

    iris %>% summarise_if(is.numeric, mean)
    #>   Sepal.Length Sepal.Width Petal.Length Petal.Width
    #> 1     5.843333    3.057333        3.758    1.199333
    

    With data.table:

    library(data.table)
    iris <- data.table(iris)
    iris[,lapply(.SD, mean)]
       Sepal.Length Sepal.Width Petal.Length Petal.Width Species
    1:     5.843333    3.057333        3.758    1.199333      NA
    
    0 讨论(0)
  • 2020-12-17 00:03

    Your above solution does work assuming the columns are in the correct is.numeric format. See below example:

    a <- c(1,2,3)
    mean(a)
    
    b <- c(2,4,6)
    mean(b)
    
    d <- c(3,6,9)
    
    mydata <- cbind(b,a,d)
    
    
    mean(mydata[,1:3])
    
    0 讨论(0)
提交回复
热议问题