问题
I was referred here Counting existing permutations in R for previous related question but I can't apply it to my problem. Here is the data I have
One <- c(rep("X",6),rep("Y",3),rep("Z",2))
Two <- c(rep("A",4),rep("B",6),rep("C",1))
Three <- c(rep("J",5),rep("K",2),rep("L",4))
Number <- runif(11)
df <- data.frame(One,Two,Three,Number)
One Two Three Number
1 X A J 0.10511669
2 X A J 0.62467760
3 X A J 0.24232663
4 X A J 0.38358854
5 X B J 0.04658226
6 X B K 0.26789844
7 Y B K 0.07685341
8 Y B L 0.21372276
9 Y B L 0.13620971
10 Z B L 0.49073692
11 Z C L 0.52968279
I tried
aggregate(df, df[,c(1:3)],FUN = c(length,mean))
received
Error in match.fun(FUN) :
'c(length, mean)' is not a function, character or symbol
I am trying to aggregate by creating a new data frame that gives me the frequency of each unique triad (One, Two, Three) and another column that contains the median of Number for each unique triad. So for the (X,A,J) triad, I want Count = 4 and Median to be the median of the first four numbers under Number.
回答1:
You could use dplyr
library(dplyr)
res <- df%>%
group_by(One,Two,Three) %>%
summarize(length=n(), Mean=mean(Number)) #change `mean` to `median` if you want `median`
str(res)
#Classes ‘grouped_df’, ‘tbl_df’, ‘tbl’ and 'data.frame': 7 obs. of 5 variables:
----------
str(as.data.frame(res))
#'data.frame': 7 obs. of 5 variables:
# $ One : Factor w/ 3 levels "X","Y","Z": 1 1 1 2 2 3 3
# $ Two : Factor w/ 3 levels "A","B","C": 1 2 2 2 2 2 3
# $ Three : Factor w/ 3 levels "J","K","L": 1 1 2 2 3 3 3
# $ length: int 4 1 1 1 2 1 1
# $ Mean : num 0.689 0.989 0.524 0.181 0.345 ...
or
library(data.table)
setDT(df)[,list(length=.N, Mean=mean(Number)),by=list(One,Two,Three)]
# One Two Three length Mean
# 1: X A J 4 0.3660189
# 2: X B J 1 0.8389641
# 3: X B K 1 0.2815004
# 4: Y B K 1 0.4990414
# 5: Y B L 2 0.3814621
# 6: Z B L 1 0.1144003
# 7: Z C L 1 0.9508751
回答2:
OTT <- paste(One,Two,Three)
ott.mean <- tapply(Number,OTT,mean)
ott.count <- tapply(OTT,OTT,length)
cbind(ott.mean,ott.count)
回答3:
Seems pretty straightforward:
aggregate( df$Number, df[ , c(1:3)],
FUN = function(x) { c( len=length(x), mn=mean(x) ) } )
@latemail. Not sure what you mean by a 'borked' data.frame. The fourth element is a matrix. Matrices are legitimate components of dataframes:
> d2[[4]]
len mn
[1,] 4 0.7531795
[2,] 1 0.8777003
[3,] 1 0.8003510
[4,] 1 0.6113566
[5,] 2 0.2470044
[6,] 1 0.3444656
[7,] 1 0.7517357
And the matrix can be accessed in the usual way:
> d2[ , 'x'][ , "mn"]
[1] 0.7531795 0.8777003 0.8003510 0.6113566 0.2470044 0.3444656 0.7517357
来源:https://stackoverflow.com/questions/24926039/aggregating-data-based-on-unique-triads-in-r