I am trying to understand dplyr. I am splitting values in my data frame by group, bins and by sign, and I am trying to get a mean value for each group/bin/sign combination. I would like to output a data frame with these counts per each group/bin/sign combination, and the total numbers per each group. I think I have it but sometimes I get different values in base R compared to the output of ddplyr. Am I doing this correctly? It is also very contorted...is there a more direct way?
library(ggplot2) df <- data.frame( id = sample(LETTERS[1:3], 100, replace=TRUE), tobin = rnorm(1000), value = rnorm(1000) ) df$tobin[sample(nrow(df), 10)]=0 df$bin = cut_interval(abs(df$tobin), length=1) df$sign = ifelse(df$tobin==0, "NULL", ifelse(df$tobin>0, "-", "+")) # Find mean of value by group, bin, and sign using dplyr library(dplyr) res <- df %>% group_by(id, bin, sign) %>% summarise(Num = length(bin), value=mean(value,na.rm=TRUE)) res %>% group_by(id) %>% summarise(total= sum(Num)) res=data.frame(res) total=data.frame(total) res$total = total[match(res$id, total$id),"total"] res[res$id=="A" & res$bin=="[0,1]" & res$sign=="NULL",] # Check in base R if mean by group, bin, and sign is correct # Sometimes not? groupA = df[df$id=="A" & df$bin=="[0,1]" & df$sign=="NULL",] mean(groupA$value, na.rm=T)
I am going crazy because it doesn't work on my data, and this command just repeats the mean of the whole dataset:
ddply(df, .(id, bin, sign), summarize, mean = mean(value,na.rm=TRUE))
Where mean is equal to mean(value,na.rm=TRUE), completely ignoring the grouping...All the groups are factors, and the value is numeric...
This however works:
with(df, aggregate(df$value, by = list(id, bin, sign), FUN = function(x) c(mean(x))))
Please help me..