plyr | 易学教程

Why are my dplyr group_by & summarize not working properly? (name-collision with plyr)

阅读更多关于 Why are my dplyr group_by & summarize not working properly? (name-collision with plyr)

I have a data frame that looks like this: #df ID DRUG FED AUC0t Tmax Cmax 1 1 0 100 5 20 2 1 1 200 6 25 3 0 1 NA 2 30 4 0 0 150 6 65 Ans so on. I want to summarize some statistics on AUC, Tmax and Cmax by drug DRUG and FED STATUS FED . I use dplyr. For example: for the AUC: CI90lo <- function(x) quantile(x, probs=0.05, na.rm=TRUE) CI90hi <- function(x) quantile(x, probs=0.95, na.rm=TRUE) summary <- df %>% group_by(DRUG,FED) %>% summarize(mean=mean(AUC0t, na.rm=TRUE), low = CI90lo(AUC0t), high= CI90hi(AUC0t), min=min(AUC0t, na.rm=TRUE), max=max(AUC0t,na.rm=TRUE), sd= sd(AUC0t, na.rm=TRUE))

Get the means of sub groups of means in R

阅读更多关于 Get the means of sub groups of means in R

问题 I'm a newbie of R and I don't know how to get R calculate the means of a subgroups of means which are the means of a subgroup themselves. I'll explain clearer. I have a data frame like this: GROUP WORD WLN 1 1 4 1 1 3 1 1 3 1 2 2 1 2 2 1 2 3 2 3 1 2 3 1 2 3 2 2 4 1 2 4 1 2 4 1 ... ... ... but the real one has a total of 5 groups and 25 words (5 words each group; every word has being assigned a number from 1 to 4 by 5 subjects...). I need to get the means of WLN for every word and I can do

faster way to create variable that aggregates a column by id [duplicate]

阅读更多关于 faster way to create variable that aggregates a column by id [duplicate]

问题 This question already has answers here : Calculate group mean (or other summary stats) and assign to original data (4 answers) Closed 2 years ago . Is there a faster way to do this? I guess this is unnecessary slow and that a task like this can be accomplished with base functions. df <- ddply(df, "id", function(x) cbind(x, perc.total = sum(x$cand.perc))) I'm quite new to R. I have looked at by() , aggregate() and tapply() , but didn't get them to work at all or in the way I wanted. Rather

Find number of rows using dplyr/group_by

阅读更多关于 Find number of rows using dplyr/group_by

问题 I am using the mtcars dataset. I want to find the number of records for a particular combination of data. Something very similar to the count(*) group by clause in SQL. ddply() from plyr is working for me library(plyr) ddply(mtcars, .(cyl,gear),nrow) has output cyl gear V1 1 4 3 1 2 4 4 8 3 4 5 2 4 6 3 2 5 6 4 4 6 6 5 1 7 8 3 12 8 8 5 2 Using this code library(dplyr) g <- group_by(mtcars, cyl, gear) summarise(g, length(gear)) has output length(cyl) 1 32 I found various functions to pass in to

Joining aggregated values back to the original data frame [duplicate]

阅读更多关于 Joining aggregated values back to the original data frame [duplicate]

问题 This question already has an answer here: Calculate group mean (or other summary stats) and assign to original data 4 answers One of the design patterns I use over and over is performing a \"group by\" or \"split, apply, combine (SAC)\" on a data frame and then joining the aggregated data back to the original data. This is useful, for example, when calculating each county\'s deviation from the state mean in a data frame with many states and counties. Rarely is my aggregate calculation only a

Standard error bars using stat_summary

阅读更多关于 Standard error bars using stat_summary

问题 The following code produces bar plots with standard error bars using Hmisc, ddply and ggplot: means_se <- ddply(mtcars,.(cyl), function(df) smean.sdl(df$qsec,mult=sqrt(length(df$qsec))^-1)) colnames(means_se) <- c(\"cyl\",\"mean\",\"lower\",\"upper\") ggplot(means_se,aes(cyl,mean,ymax=upper,ymin=lower,group=1)) + geom_bar(stat=\"identity\") + geom_errorbar() However, implementing the above using helper functions such as mean_sdl seems much better. For example the following code produces a

Aggregate a dataframe on a given column and display another column

阅读更多关于 Aggregate a dataframe on a given column and display another column

I have a dataframe in R of the following form: > head(data) Group Score Info 1 1 1 a 2 1 2 b 3 1 3 c 4 2 4 d 5 2 3 e 6 2 1 f I would like to aggregate it following the Score column using the max function > aggregate(data$Score, list(data$Group), max) Group.1 x 1 1 3 2 2 4 But I also would like to display the Info column associated to the maximum value of the Score column for each group. I have no idea how to do this. My desired output would be: Group.1 x y 1 1 3 c 2 2 4 d Any hint? First, you split the data using split : split(z,z$Group) Than, for each chunk, select the row with max Score:

Sum of rows based on column value

阅读更多关于 Sum of rows based on column value

问题 I want to sum rows that have the same value in one column: > df <- data.frame(\"1\"=c(\"a\",\"b\",\"a\",\"c\",\"c\"), \"2\"=c(1,5,3,6,2), \"3\"=c(3,3,4,5,2)) > df X1 X2 X3 1 a 1 3 2 b 5 3 3 a 3 4 4 c 6 5 5 c 2 2 For one column (X2), the data can be aggregated to get the sums of all rows that have the same X1 value: > ddply(df, .(X1), summarise, X2=sum(X2)) X1 X2 1 a 4 2 b 5 3 c 8 How do I do the same for X3 and an arbitrary number of other columns except X1? This is the result I want: X1 X2

dplyr summarise: Equivalent of “.drop=FALSE” to keep groups with zero length in output

阅读更多关于 dplyr summarise: Equivalent of “.drop=FALSE” to keep groups with zero length in output

When using summarise with plyr 's ddply function, empty categories are dropped by default. You can change this behavior by adding .drop = FALSE . However, this doesn't work when using summarise with dplyr . Is there another way to keep empty categories in the result? Here's an example with fake data. library(dplyr) df = data.frame(a=rep(1:3,4), b=rep(1:2,6)) # Now add an extra level to df$b that has no corresponding value in df$a df$b = factor(df$b, levels=1:3) # Summarise with plyr, keeping categories with a count of zero plyr::ddply(df, "b", summarise, count_a=length(a), .drop=FALSE) b count

R: Split unbalanced list in data.frame column

阅读更多关于 R: Split unbalanced list in data.frame column

问题 Suppose you have a data frame with the following structure: df <- data.frame(a=c(1,2,3,4), b=c(\"job1;job2\", \"job1a\", \"job4;job5;job6\", \"job9;job10;job11\")) where the column b is a semicolon-delimited list (unbalanced by row). The ideal data.frame would be: id,job,jobNum 1,job1,1 1,job2,2 ... 3,job6,3 4,job9,1 4,job10,2 4,job11,3 I have a partial solution that takes almost 2 hours (170K rows): # Split the column by the semicolon. Results in a list. df$allJobs <- strsplit(df$b, \";\",