Find unique combinations based on two columns and calculate the mean

好久不见. 提交于 2019-12-13 08:45:10

问题


I have a problem in R, which I can't seem to solve.

I have the following dataframe:

Image 1

I would like to:

  1. Find the unique combinations of the columns 'Species' and 'Effects'
  2. Report the concentration belonging to this unique combination
  3. If this unique combination is present more than one time, calculate the mean concentration

And would like to get the following dataframe:

Image 2

I have tried next script to get the unique combinations:

UniqueCombinations <- Data[!duplicated(Data[,1:2]),]

but don't know how to proceed from there.

Thanks in advance for your answers!

Tina


回答1:


Try the following (Thanks Brandon Bertelsen for nice comment):

Creating your data:

foo = data.frame(Species=c(rep("A",4),"B",rep("C",3),"D","D"), 
                 Effect=c(rep("Reproduction",3), rep("Growth",2),
                          "Reproduction", rep("Mortality",2), rep("Growth",2)), 
                 Concentration=c(1.2,1.4,1.3,1.5,1.6,1.2,1.1,1,1.3,1.4))

Using great package plyr for a bit of magic :)

library(plyr)
ddply(foo, .(Species,Effect), function(x) mean(x[,"Concentration"]))

And this is a bit more complicated, but cleaner version (Thanks again to Brandon Bertelsen):

ddply(foo, .(Species,Effect), summarize, mean=mean(Concentration))



回答2:


Create some example data:

dat <- data.frame(Species = rep.int(LETTERS[1:4], c(4, 1, 3, 2)),
                  Effect = c(rep("Reproduction", 3), "Growth", "Growth",
                             "Reproduction", "Mortality", "Mortality",
                             "Growth", "Growth"),
                  Concentration = rnorm(10))

You can use the function aggregate:

aggregate(Concentration ~ Species + Effect, dat, mean)



回答3:


Just for fun before I call it a night.... Assuming your data.frame is called "dat", here are two more options:

  1. A data.table solution.

    library(data.table)
    datDT <- data.table(dat, key="Species,Effect")
    datDT[, list(Concentration = mean(Concentration)), by = key(datDT)]
    #    Species       Effect Concentration
    # 1:       A       Growth          1.50
    # 2:       A Reproduction          1.30
    # 3:       B       Growth          1.60
    # 4:       C    Mortality          1.05
    # 5:       C Reproduction          1.20
    # 6:       D       Growth          1.35
    
  2. An sqldf solution.

    library(sqldf)
    sqldf("select Species, Effect,
          avg(Concentration) `Concentration`
          from dat
          group by Species, Effect")
    #   Species       Effect Concentration
    # 1       A       Growth          1.50
    # 2       A Reproduction          1.30
    # 3       B       Growth          1.60
    # 4       C    Mortality          1.05
    # 5       C Reproduction          1.20
    # 6       D       Growth          1.35
    


来源:https://stackoverflow.com/questions/13017511/find-unique-combinations-based-on-two-columns-and-calculate-the-mean

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!