Aggregate by factor levels, keeping other variables in the resulting data frame

前端 未结 5 2075
太阳男子
太阳男子 2020-12-05 18:45

I\'m trying to calculate the minimum values of a numeric column for each level of a factor, while keeping values of another factor in the resulting data frame.



        
相关标签:
5条回答
  • 2020-12-05 19:20

    Just to show that there's always multiple ways to skin a cat:

    Using ave to get the indexes of the minimum rows in each group:

    dat[which(ave(dat$value,dat$code,FUN=function(x) x==min(x))==1),]
    
    #  code   index value
    #1 HH11  023434  24.1
    #2 HH45 3377477  37.2
    #5 JL03 1177777  20.0
    

    This method also has the potential benefit of returning multiple rows per code group in the instance of multiple values being the minimum.

    And another method using by:

    do.call(rbind,
      by(dat, dat$code, function(x) cbind(x[1,c("code","index")],value=min(x$value)))
    )
    #      code   index value
    # HH11 HH11  023434  24.1
    # HH45 HH45 3377477  37.2
    # JL03 JL03 3388595  20.0
    
    0 讨论(0)
  • 2020-12-05 19:23

    With the dplyr and data.table packages, you can do the following. You can get an index for the row having the minimum value for each group. You can use that in slice() if you use dplyr. You can achieve the same subsetting using .SD if you use data.table.

    library(dplyr)
    library(data.table)
    
    dat %>%
    group_by(code) %>%
    slice(which.min(value))
    
    #    code   index value
    #  <fctr>  <fctr> <dbl>
    #1   HH11  023434  24.1
    #2   HH45 3377477  37.2
    #3   JL03 1177777  20.0
    
    setDT(dat)[, .SD[which.min(value)], by = code]
    
    #   code   index value
    #1: HH11  023434  24.1
    #2: HH45 3377477  37.2
    #3: JL03 1177777  20.0
    
    0 讨论(0)
  • 2020-12-05 19:28

    If you had added in the index variable that would have done it.

    library(plyr)
    
    # ddply
    ddply(dat, .(code,index), summarise, val = min(value))
    
    # base R
    aggregate(value ~ code + index, dat, min)
    
    0 讨论(0)
  • 2020-12-05 19:30

    You need to use merge on result of aggregate and original data.frame

    merge(aggregate(value ~ code, dat, min), dat, by = c("code", "value"))
    ##   code value   index
    ## 1 HH11  24.1  023434
    ## 2 HH45  37.2 3377477
    ## 3 JL03  20.0 1177777
    
    0 讨论(0)
  • 2020-12-05 19:32

    Well, a few minutes more searching would have gotten me there... this answer seems to do the trick:

    merge(dat, aggregate(value ~ code, dat, min))

    0 讨论(0)
提交回复
热议问题