plyr

How to ddply() without sorting?

[亡魂溺海] 提交于 2019-11-28 20:53:53
I use the following code to summarize my data, grouped by Compound, Replicate and Mass. summaryDataFrame <- ddply(reviewDataFrame, .(Compound, Replicate, Mass), .fun = calculate_T60_Over_T0_Ratio) An unfortunate side effect is that the resulting data frame is sorted by those fields. I would like to do this and keep Compound, Replicate and Mass in the same order as in the original data frame. Any ideas? I tried adding a "Sorting" column of sequential integers to the original data, but of course I can't include that in the .variables since I don't want to 'group by' that, and so it is not

How to fill NA with median?

走远了吗. 提交于 2019-11-28 20:48:55
Example data: set.seed(1) df <- data.frame(years=sort(rep(2005:2010, 12)), months=1:12, value=c(rnorm(60),NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA)) head(df) years months value 1 2005 1 -0.6264538 2 2005 2 0.1836433 3 2005 3 -0.8356286 4 2005 4 1.5952808 5 2005 5 0.3295078 6 2005 6 -0.8204684 Tell me please, how i can replace NA in df$value to median of others months? "value" must contain the median of value of all previous values for the same month. That is, if current month is May, "value" must contain the median value for all previous values of the month of May. Luciano Selzer Or with ave df <-

Aggregating sub totals and grand totals with data.table

独自空忆成欢 提交于 2019-11-28 19:40:44
I've got a data.table in R: library(data.table) set.seed(1) DT = data.table( group=sample(letters[1:2],100,replace=TRUE), year=sample(2010:2012,100,replace=TRUE), v=runif(100)) Aggregating this data into a summary table by group and year is simple and elegant: table <- DT[,mean(v),by='group, year'] However, aggregating this data into a summary table, including subtotals and grand totals, is a little more difficult, and a lot less elegant: library(plyr) yearTot <- DT[,list(mean(v),year='Total'),by='group'] groupTot <- DT[,list(mean(v),group='Total'),by='year'] Tot <- DT[,list(mean(v), year=

quick/elegant way to construct mean/variance summary table

隐身守侯 提交于 2019-11-28 18:46:17
I can achieve this task, but I feel like there must be a "best" (slickest, most compact, clearest-code, fastest?) way of doing it and have not figured it out so far ... For a specified set of categorical factors I want to construct a table of means and variances by group. generate data : set.seed(1001) d <- expand.grid(f1=LETTERS[1:3],f2=letters[1:3], f3=factor(as.character(as.roman(1:3))),rep=1:4) d$y <- runif(nrow(d)) d$z <- rnorm(nrow(d)) desired output : f1 f2 f3 y.mean y.var 1 A a I 0.6502307 0.09537958 2 A a II 0.4876630 0.11079670 3 A a III 0.3102926 0.20280568 4 A b I 0.3914084 0

Efficient alternatives to merge for larger data.frames R

假装没事ソ 提交于 2019-11-28 16:43:10
I am looking for an efficient (both computer resource wise and learning/implementation wise) method to merge two larger (size>1 million / 300 KB RData file) data frames. "merge" in base R and "join" in plyr appear to use up all my memory effectively crashing my system. Example load test data frame and try test.merged<-merge(test, test) or test.merged<-join(test, test, type="all") - The following post provides a list of merge and alternatives: How to join (merge) data frames (inner, outer, left, right)? The following allows object size inspection: https://heuristically.wordpress.com/2010/01/04

replace NA with groups mean in a non specified number of columns [duplicate]

早过忘川 提交于 2019-11-28 14:38:23
This question already has an answer here: How to replace NA with mean by subset in R (impute with plyr?) 3 answers I want to replace the NA with mean of each single group collembola and mite in multiple columns. Here it is an example with 3 columns however I want to apply this a data frame with 5000 columns dat <- read.table(text = "id ID length width extra 101 collembola 2.1 0.9 1 102 mite NA 0.7 NA 103 mite 1.1 0.8 2 104 collembola 1 NA 3 105 collembola 1.5 0.5 4 106 mite NA NA NA 106 mite 1.9 NA 4", header=TRUE) It works if I enter each column library(plyr) impute.mean <- function(x)

How to expand a large dataframe in R

风流意气都作罢 提交于 2019-11-28 12:58:55
I have a dataframe df <- data.frame( id = c(1, 1, 1, 2, 2, 3, 3, 3, 3, 4), date = c("1985-06-19", "1985-06-19", "1985-06-19", "1985-08-01", "1985-08-01", "1990-06-19", "1990-06-19", "1990-06-19", "1990-06-19", "2000-05-12"), spp = c("a", "b", "c", "c", "d", "b", "c", "d", "a", "b"), y = rpois(10, 5)) id date spp y 1 1 1985-06-19 a 6 2 1 1985-06-19 b 3 3 1 1985-06-19 c 7 4 2 1985-08-01 c 7 5 2 1985-08-01 d 6 6 3 1990-06-19 b 5 7 3 1990-06-19 c 4 8 3 1990-06-19 d 4 9 3 1990-06-19 a 6 10 4 2000-05-12 b 6 I want to expand it so that there is every combination of id and spp and have y = 0 for every

Split Data Frame into Rows of Fixed Size

笑着哭i 提交于 2019-11-28 12:23:01
I have a bunch of data frames with varying degrees of length, ranging from approx. 15,000 to 500,000. For each of these data frames, I would like to split them up into smaller data frames each with 300 rows which I would do further processing on. How can I do this? This ( Split up a dataframe by number of rows ) provides a partial answer, but it doesn't work because not all my data frames have length that are multiples of 300. Would greatly appreciate it if a plyr and non-plyr solution can both be provided. Thank you! I don't understand why a plyr solution is needed. split works perfectly well

Summary of proportions by group

。_饼干妹妹 提交于 2019-11-28 11:07:26
问题 What would be the best tool/package to use to calculate proportions by subgroups? I thought I could try something like this: data(mtcars) library(plyr) ddply(mtcars, .(cyl), transform, Pct = gear/length(gear)) But the output is not what I want, as I would want something with a number of rows equal to cyl . Even if change it to summarise i still get the same problem. I am open to other packages, but I thought plyr would be best as I would eventually like to build a function around this. Any

How to get summary statistics for multiple variables by multiple groups?

余生颓废 提交于 2019-11-28 10:32:23
问题 I know that there are many answers provided in this forum on how to get summary statistics (e.g. mean, se, N) for multiple groups using options like aggregate , ddply or data.table . I'm not sure, however, how to apply these functions over multiple columns at once. More specifically, I would like to know how to extend the following ddply command over multiple columns (dv1, dv2, dv3) without re-typing the code with different variable name each time. library(reshape2) library(plyr) group1 <- c