Aggregating data based on unique triads in R

拟墨画扇 提交于 2019-12-11 12:10:58

问题


I was referred here Counting existing permutations in R for previous related question but I can't apply it to my problem. Here is the data I have

One <- c(rep("X",6),rep("Y",3),rep("Z",2))
Two <- c(rep("A",4),rep("B",6),rep("C",1))
Three <- c(rep("J",5),rep("K",2),rep("L",4))
Number <- runif(11)


df <- data.frame(One,Two,Three,Number)


   One Two Three     Number
1    X   A     J 0.10511669
2    X   A     J 0.62467760
3    X   A     J 0.24232663
4    X   A     J 0.38358854
5    X   B     J 0.04658226
6    X   B     K 0.26789844
7    Y   B     K 0.07685341
8    Y   B     L 0.21372276
9    Y   B     L 0.13620971
10   Z   B     L 0.49073692
11   Z   C     L 0.52968279

I tried

aggregate(df, df[,c(1:3)],FUN = c(length,mean))

received

Error in match.fun(FUN) : 
'c(length, mean)' is not a function, character or symbol

I am trying to aggregate by creating a new data frame that gives me the frequency of each unique triad (One, Two, Three) and another column that contains the median of Number for each unique triad. So for the (X,A,J) triad, I want Count = 4 and Median to be the median of the first four numbers under Number.


回答1:


You could use dplyr

 library(dplyr)
 res <- df%>%
 group_by(One,Two,Three) %>%
 summarize(length=n(), Mean=mean(Number)) #change `mean` to `median` if you want `median`

 str(res)
#Classes ‘grouped_df’, ‘tbl_df’, ‘tbl’ and 'data.frame':    7 obs. of  5 variables:
 ----------
  str(as.data.frame(res))
#'data.frame':  7 obs. of  5 variables:
# $ One   : Factor w/ 3 levels "X","Y","Z": 1 1 1 2 2 3 3
# $ Two   : Factor w/ 3 levels "A","B","C": 1 2 2 2 2 2 3
# $ Three : Factor w/ 3 levels "J","K","L": 1 1 2 2 3 3 3
# $ length: int  4 1 1 1 2 1 1
# $ Mean  : num  0.689 0.989 0.524 0.181 0.345 ...

or

library(data.table)
setDT(df)[,list(length=.N, Mean=mean(Number)),by=list(One,Two,Three)]
#      One Two Three length      Mean
# 1:   X   A     J      4 0.3660189
# 2:   X   B     J      1 0.8389641
# 3:   X   B     K      1 0.2815004
# 4:   Y   B     K      1 0.4990414
# 5:   Y   B     L      2 0.3814621
# 6:   Z   B     L      1 0.1144003
# 7:   Z   C     L      1 0.9508751



回答2:


OTT <- paste(One,Two,Three)
ott.mean <- tapply(Number,OTT,mean)
ott.count <- tapply(OTT,OTT,length)
cbind(ott.mean,ott.count)



回答3:


Seems pretty straightforward:

aggregate( df$Number, df[ , c(1:3)],
                    FUN = function(x) { c( len=length(x), mn=mean(x) ) } )

@latemail. Not sure what you mean by a 'borked' data.frame. The fourth element is a matrix. Matrices are legitimate components of dataframes:

> d2[[4]]

     len        mn
[1,]   4 0.7531795
[2,]   1 0.8777003
[3,]   1 0.8003510
[4,]   1 0.6113566
[5,]   2 0.2470044
[6,]   1 0.3444656
[7,]   1 0.7517357

And the matrix can be accessed in the usual way:

> d2[ , 'x'][ , "mn"]
[1] 0.7531795 0.8777003 0.8003510 0.6113566 0.2470044 0.3444656 0.7517357


来源:https://stackoverflow.com/questions/24926039/aggregating-data-based-on-unique-triads-in-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!