plyr

R: Grouped rolling window linear regression with rollapply and ddply

穿精又带淫゛_ 提交于 2019-12-02 04:40:06
I have a data set with several grouping variables on which I want to run a rolling window linear regression. The ultimate goals is to extract the 10 linear regressions with the lowest slopes and average them together to provide a mean minimum rate of change. I have found examples using rollapply to calculate rolling window linear regressions, but I have the added complication that I would like to apply these linear regressions to groups within the data set. Here is a sample data set and my current code which is close and isn't quite working. dat<-data.frame(w=c(rep(1,27), rep(2,27),rep(3,27)),

Rename factor levels based on a condition in R

最后都变了- 提交于 2019-12-02 04:39:46
问题 I want to combine all factors with a count less than n into one factor named "Else" For example if n = 3 then in the following df I want to combine "c", "d" and "e" as "Else": df = data.frame(x=c(1:10), y=c("a","a","a","b","b","b","c","d","d","e")) I started out by getting a df with all the low count values: library(plyr) lowcounts = ddply(df, "y", function(z){if(nrow(z)<3) nrow(z) else NULL}) I know I could change these manually but in practice I have dozens of levels so I need to automate

Rescaling with plyr (ddply) in R

南楼画角 提交于 2019-12-02 04:12:18
I've got this csv table for which I need to rescale data between 0 and 1 per each column. That is, the lowest value of any given column will be 0, the highest will be 1, and all other values will be linearly scaled accordingly. Here's my script: tableau <- read.csv("/tableau.csv") tableau.m <- melt(tableau) tableau.m <- ddply(tableau.m, .(variable), transform,rescale = rescale(value)) (And here's the data: https://dl.dropboxusercontent.com/u/73950/tableau.csv ) The issue is that I need the second column ("B") to be inverted. That is, for this column only and not for the others, the lowest

transform data frame string variable names

て烟熏妆下的殇ゞ 提交于 2019-12-02 02:49:45
I have a data frame that contains dates and id's. I need to add multiple columns to this data frame based on each date. I use ddply to do this as follows: ddply(df, "dt", transform, new_column1 = myfun(column_name_1)) However,I have a bunch of column names and would like to add multiple new columns. Is there a way that I can pass a string to transform instead of new_column1? For example I tried: ddply(df, "dt", transform, get("some_column_name")=myfun(column_name_1)) but this does not work. Additionally, if I pass the column_name_1 to myfun as a string, can I just use get("column_name_1")

Allow a maximum number of entries when certain conditions apply

情到浓时终转凉″ 提交于 2019-12-02 01:13:33
I have a dataset with a lot of entries. Each of these entries belongs to a certain ID (belongID), the entries are unique (with uniqID), but multiple entries can come from the same source (sourceID). It is also possible that multiple entries from the same source have a the same belongID. For the purposes of the research I need to do on the dataset I have to get rid of the entries of a single sourceID that occur more than 5 times for 1 belongID. The maximum of 5 entries that need to be kept are the ones with the highest 'Time' value. To illustrate this I have the following example dataset:

How to subset data for a specific column with ddply?

落花浮王杯 提交于 2019-12-02 01:13:00
问题 I would like to know if there is a simple way to achieve what I describe below using ddply . My data frame describes an experiment with two conditions. Participants had to select between options A and B , and we recorded how long they took to decide, and whether their responses were accurate or not. I use ddply to create averages by condition. The column nAccurate summarizes the number of accurate responses in each condition. I also want to know how much time they took to decide and express

Speed up loops and condition with R

此生再无相见时 提交于 2019-12-02 00:19:39
问题 I would like to speed up this code in R. The input is an array 3x3x3 containing integer number and based on the neighbors, if they are zero, replace them for the respective number. The output is the array "mask_roi" with the new values. ###### Start here list_neig = array(0, dim = c(3,3,3)) mask_roi = array(sample(c(0,1,2),27,replace=T), dim = c(3,3,3)) values_mask = array(1:27, dim = c(3,3,3)) values_mask_melted = melt(values_mask, varnames=c("x","y","z")) ### Tranform the 3D Matrix in a

mean from row values in a dataframe excluding min and max values in R

夙愿已清 提交于 2019-12-01 22:28:36
I got following data frame,df, (fragment displayed here): H2475 H2481 H2669 H2843 H2872 H2873 H2881 H2909 E1 94.470 26.481 15.120 18.490 16.189 11.422 14.886 0.512 E2 1.016 0.363 0.509 1.190 1.855 0.958 0.771 0.815 E3 9.671 0.637 0.571 0.447 0.116 0.452 0.403 0.003 E4 3.448 2.826 2.183 2.607 4.288 2.526 2.820 3.523 E5 2.548 1.916 1.126 1.553 1.089 1.228 0.887 1.065 what I want to do is to compute mean values of each row after removing two extreme values. For whole rows I used plyr: library(plyr) df.my_means <- adply(df, 1, transform, my_means = mean(as.matrix(df[i,]) ) ) It should be also OK

mean from row values in a dataframe excluding min and max values in R

泄露秘密 提交于 2019-12-01 21:54:21
问题 I got following data frame,df, (fragment displayed here): H2475 H2481 H2669 H2843 H2872 H2873 H2881 H2909 E1 94.470 26.481 15.120 18.490 16.189 11.422 14.886 0.512 E2 1.016 0.363 0.509 1.190 1.855 0.958 0.771 0.815 E3 9.671 0.637 0.571 0.447 0.116 0.452 0.403 0.003 E4 3.448 2.826 2.183 2.607 4.288 2.526 2.820 3.523 E5 2.548 1.916 1.126 1.553 1.089 1.228 0.887 1.065 what I want to do is to compute mean values of each row after removing two extreme values. For whole rows I used plyr: library

How to subset data for a specific column with ddply?

别说谁变了你拦得住时间么 提交于 2019-12-01 21:15:39
I would like to know if there is a simple way to achieve what I describe below using ddply . My data frame describes an experiment with two conditions. Participants had to select between options A and B , and we recorded how long they took to decide, and whether their responses were accurate or not. I use ddply to create averages by condition. The column nAccurate summarizes the number of accurate responses in each condition. I also want to know how much time they took to decide and express it in the column RT . However, I want to calculate average response times only when participants got the