plyr | 易学教程

How to ddply() without sorting?

阅读更多关于 How to ddply() without sorting?

I use the following code to summarize my data, grouped by Compound, Replicate and Mass. summaryDataFrame <- ddply(reviewDataFrame, .(Compound, Replicate, Mass), .fun = calculate_T60_Over_T0_Ratio) An unfortunate side effect is that the resulting data frame is sorted by those fields. I would like to do this and keep Compound, Replicate and Mass in the same order as in the original data frame. Any ideas? I tried adding a "Sorting" column of sequential integers to the original data, but of course I can't include that in the .variables since I don't want to 'group by' that, and so it is not

How to fill NA with median?

阅读更多关于 How to fill NA with median?

Example data: set.seed(1) df <- data.frame(years=sort(rep(2005:2010, 12)), months=1:12, value=c(rnorm(60),NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA)) head(df) years months value 1 2005 1 -0.6264538 2 2005 2 0.1836433 3 2005 3 -0.8356286 4 2005 4 1.5952808 5 2005 5 0.3295078 6 2005 6 -0.8204684 Tell me please, how i can replace NA in df$value to median of others months? "value" must contain the median of value of all previous values for the same month. That is, if current month is May, "value" must contain the median value for all previous values of the month of May. Luciano Selzer Or with ave df <-

Aggregating sub totals and grand totals with data.table

阅读更多关于 Aggregating sub totals and grand totals with data.table

I've got a data.table in R: library(data.table) set.seed(1) DT = data.table( group=sample(letters[1:2],100,replace=TRUE), year=sample(2010:2012,100,replace=TRUE), v=runif(100)) Aggregating this data into a summary table by group and year is simple and elegant: table <- DT[,mean(v),by='group, year'] However, aggregating this data into a summary table, including subtotals and grand totals, is a little more difficult, and a lot less elegant: library(plyr) yearTot <- DT[,list(mean(v),year='Total'),by='group'] groupTot <- DT[,list(mean(v),group='Total'),by='year'] Tot <- DT[,list(mean(v), year=

quick/elegant way to construct mean/variance summary table

阅读更多关于 quick/elegant way to construct mean/variance summary table

I can achieve this task, but I feel like there must be a "best" (slickest, most compact, clearest-code, fastest?) way of doing it and have not figured it out so far ... For a specified set of categorical factors I want to construct a table of means and variances by group. generate data : set.seed(1001) d <- expand.grid(f1=LETTERS[1:3],f2=letters[1:3], f3=factor(as.character(as.roman(1:3))),rep=1:4) d$y <- runif(nrow(d)) d$z <- rnorm(nrow(d)) desired output : f1 f2 f3 y.mean y.var 1 A a I 0.6502307 0.09537958 2 A a II 0.4876630 0.11079670 3 A a III 0.3102926 0.20280568 4 A b I 0.3914084 0

Efficient alternatives to merge for larger data.frames R

阅读更多关于 Efficient alternatives to merge for larger data.frames R

I am looking for an efficient (both computer resource wise and learning/implementation wise) method to merge two larger (size>1 million / 300 KB RData file) data frames. "merge" in base R and "join" in plyr appear to use up all my memory effectively crashing my system. Example load test data frame and try test.merged<-merge(test, test) or test.merged<-join(test, test, type="all") - The following post provides a list of merge and alternatives: How to join (merge) data frames (inner, outer, left, right)? The following allows object size inspection: https://heuristically.wordpress.com/2010/01/04

replace NA with groups mean in a non specified number of columns [duplicate]

阅读更多关于 replace NA with groups mean in a non specified number of columns [duplicate]

This question already has an answer here: How to replace NA with mean by subset in R (impute with plyr?) 3 answers I want to replace the NA with mean of each single group collembola and mite in multiple columns. Here it is an example with 3 columns however I want to apply this a data frame with 5000 columns dat <- read.table(text = "id ID length width extra 101 collembola 2.1 0.9 1 102 mite NA 0.7 NA 103 mite 1.1 0.8 2 104 collembola 1 NA 3 105 collembola 1.5 0.5 4 106 mite NA NA NA 106 mite 1.9 NA 4", header=TRUE) It works if I enter each column library(plyr) impute.mean <- function(x)

How to expand a large dataframe in R

阅读更多关于 How to expand a large dataframe in R

I have a dataframe df <- data.frame( id = c(1, 1, 1, 2, 2, 3, 3, 3, 3, 4), date = c("1985-06-19", "1985-06-19", "1985-06-19", "1985-08-01", "1985-08-01", "1990-06-19", "1990-06-19", "1990-06-19", "1990-06-19", "2000-05-12"), spp = c("a", "b", "c", "c", "d", "b", "c", "d", "a", "b"), y = rpois(10, 5)) id date spp y 1 1 1985-06-19 a 6 2 1 1985-06-19 b 3 3 1 1985-06-19 c 7 4 2 1985-08-01 c 7 5 2 1985-08-01 d 6 6 3 1990-06-19 b 5 7 3 1990-06-19 c 4 8 3 1990-06-19 d 4 9 3 1990-06-19 a 6 10 4 2000-05-12 b 6 I want to expand it so that there is every combination of id and spp and have y = 0 for every

Split Data Frame into Rows of Fixed Size

阅读更多关于 Split Data Frame into Rows of Fixed Size

I have a bunch of data frames with varying degrees of length, ranging from approx. 15,000 to 500,000. For each of these data frames, I would like to split them up into smaller data frames each with 300 rows which I would do further processing on. How can I do this? This ( Split up a dataframe by number of rows ) provides a partial answer, but it doesn't work because not all my data frames have length that are multiples of 300. Would greatly appreciate it if a plyr and non-plyr solution can both be provided. Thank you! I don't understand why a plyr solution is needed. split works perfectly well

Summary of proportions by group

阅读更多关于 Summary of proportions by group

问题 What would be the best tool/package to use to calculate proportions by subgroups? I thought I could try something like this: data(mtcars) library(plyr) ddply(mtcars, .(cyl), transform, Pct = gear/length(gear)) But the output is not what I want, as I would want something with a number of rows equal to cyl . Even if change it to summarise i still get the same problem. I am open to other packages, but I thought plyr would be best as I would eventually like to build a function around this. Any

How to get summary statistics for multiple variables by multiple groups?

阅读更多关于 How to get summary statistics for multiple variables by multiple groups?

问题 I know that there are many answers provided in this forum on how to get summary statistics (e.g. mean, se, N) for multiple groups using options like aggregate , ddply or data.table . I'm not sure, however, how to apply these functions over multiple columns at once. More specifically, I would like to know how to extend the following ddply command over multiple columns (dv1, dv2, dv3) without re-typing the code with different variable name each time. library(reshape2) library(plyr) group1 <- c