plyr

How to ddply() without sorting?

旧街凉风 提交于 2019-11-27 13:17:12
问题 I use the following code to summarize my data, grouped by Compound, Replicate and Mass. summaryDataFrame <- ddply(reviewDataFrame, .(Compound, Replicate, Mass), .fun = calculate_T60_Over_T0_Ratio) An unfortunate side effect is that the resulting data frame is sorted by those fields. I would like to do this and keep Compound, Replicate and Mass in the same order as in the original data frame. Any ideas? I tried adding a "Sorting" column of sequential integers to the original data, but of

R - Faster Way to Calculate Rolling Statistics Over a Variable Interval

試著忘記壹切 提交于 2019-11-27 12:56:07
I'm curious if anyone out there can come up with a (faster) way to calculate rolling statistics (rolling mean, median, percentiles, etc.) over a variable interval of time (windowing). That is, suppose one is given randomly timed observations (i.e. not daily, or weekly data, observations just have a time stamp, as in ticks data), and suppose you'd like to look at center and dispersion statistics that you are able to widen and tighten the interval of time over which these statistics are calculated. I made a simple for loop that does this. But it obviously runs very slow (In fact I think my loop

Compute rolling sum by id variables, with missing timepoints

半城伤御伤魂 提交于 2019-11-27 12:38:49
问题 I'm trying to learn R and there are a few things I've done for 10+ years in SAS that I cannot quite figure out the best way to do in R. Take this data: id class t count desired -- ----- ---------- ----- ------- 1 A 2010-01-15 1 1 1 A 2010-02-15 2 3 1 B 2010-04-15 3 3 1 B 2010-09-15 4 4 2 A 2010-01-15 5 5 2 B 2010-06-15 6 6 2 B 2010-08-15 7 13 2 B 2010-09-15 8 21 I want to calculate the column desired as a rolling sum by id, class, and within a 4 months rolling window. Notice that not all

Aggregating sub totals and grand totals with data.table

限于喜欢 提交于 2019-11-27 12:28:04
问题 I've got a data.table in R: library(data.table) set.seed(1) DT = data.table( group=sample(letters[1:2],100,replace=TRUE), year=sample(2010:2012,100,replace=TRUE), v=runif(100)) Aggregating this data into a summary table by group and year is simple and elegant: table <- DT[,mean(v),by='group, year'] However, aggregating this data into a summary table, including subtotals and grand totals, is a little more difficult, and a lot less elegant: library(plyr) yearTot <- DT[,list(mean(v),year='Total'

How to merge two data frames on common columns in R with sum of others?

丶灬走出姿态 提交于 2019-11-27 12:25:01
R Version 2.11.1 32-bit on Windows 7 I got two data sets: data_A and data_B: data_A USER_A USER_B ACTION 1 11 0.3 1 13 0.25 1 16 0.63 1 17 0.26 2 11 0.14 2 14 0.28 data_B USER_A USER_B ACTION 1 13 0.17 1 14 0.27 2 11 0.25 Now I want to add the ACTION of data_B to the data_A if their USER_A and USER_B are equal. As the example above, the result would be: data_A USER_A USER_B ACTION 1 11 0.3 1 13 0.25+0.17 1 16 0.63 1 17 0.26 2 11 0.14+0.25 2 14 0.28 So how could I achieve it? You can use ddply in package plyr and combine it with merge : library(plyr) ddply(merge(data_A, data_B, all.x=TRUE), .

ddply for sum by group in R

爱⌒轻易说出口 提交于 2019-11-27 12:24:08
I have a sample dataframe "data" as follows: X Y Month Year income 2281205 228120 3 2011 1000 2281212 228121 9 2010 1100 2281213 228121 12 2010 900 2281214 228121 3 2011 9000 2281222 228122 6 2010 1111 2281223 228122 9 2010 3000 2281224 228122 12 2010 1889 2281225 228122 3 2011 778 2281243 228124 12 2010 1111 2281244 228124 3 2011 200 2281282 228128 9 2010 7889 2281283 228128 12 2010 2900 2281284 228128 3 2011 3400 2281302 228130 9 2010 1200 2281303 228130 12 2010 2000 2281304 228130 3 2011 1900 2281352 228135 9 2010 2300 2281353 228135 12 2010 1333 2281354 228135 3 2011 2340 I want to use the

Can `ddply` (or similar) do a sliding window?

老子叫甜甜 提交于 2019-11-27 11:58:46
Something like sliding = function(df, n, f) ldply(1:(nrow(df) - n + 1), function(k) f(df[k:(k + n - 1), ]) ) That would be used like > df n a 1 1 0.8021891 2 2 0.9446330 ... > sliding(df, 2, function(df) with(df, + data.frame(n = n[1], a = a[1], b = sum(n - a)) + )) n a b 1 1 0.8021891 1.253178 ... Except straight inside ddply , so that I could get the nice syntactic sugar that comes with it? Since there hasn't been an answer posted to this question, I thought I'd put one up to make the case that there is actually an even better way to go about this type of problem - one that can also be

Trouble converting long list of data.frames (~1 million) to single data.frame using do.call and ldply

耗尽温柔 提交于 2019-11-27 11:56:22
I know there are many questions here in SO about ways to convert a list of data.frames to a single data.frame using do.call or ldply, but this questions is about understanding the inner workings of both methods and trying to figure out why I can't get either to work for concatenating a list of almost 1 million df's of the same structure, same field names, etc. into a single data.frame. Each data.frame is of one row and 21 columns. The data started out as a JSON file, which I converted to lists using fromJSON, then ran another lapply to extract part of the list and converted to data.frame and

quick/elegant way to construct mean/variance summary table

我怕爱的太早我们不能终老 提交于 2019-11-27 11:49:13
问题 I can achieve this task, but I feel like there must be a "best" (slickest, most compact, clearest-code, fastest?) way of doing it and have not figured it out so far ... For a specified set of categorical factors I want to construct a table of means and variances by group. generate data : set.seed(1001) d <- expand.grid(f1=LETTERS[1:3],f2=letters[1:3], f3=factor(as.character(as.roman(1:3))),rep=1:4) d$y <- runif(nrow(d)) d$z <- rnorm(nrow(d)) desired output : f1 f2 f3 y.mean y.var 1 A a I 0

Get the means of sub groups of means in R

老子叫甜甜 提交于 2019-11-27 09:39:15
I'm a newbie of R and I don't know how to get R calculate the means of a subgroups of means which are the means of a subgroup themselves. I'll explain clearer. I have a data frame like this: GROUP WORD WLN 1 1 4 1 1 3 1 1 3 1 2 2 1 2 2 1 2 3 2 3 1 2 3 1 2 3 2 2 4 1 2 4 1 2 4 1 ... ... ... but the real one has a total of 5 groups and 25 words (5 words each group; every word has being assigned a number from 1 to 4 by 5 subjects...). I need to get the means of WLN for every word and I can do that easily with a loop and save the results in a vector; but then I need a vector with the means of these