plyr

R ddply, applying if and ifelse functions

流过昼夜 提交于 2019-12-04 05:21:17
I'm trying to apply a function to a dataframe using ddply from the plyr package, but I'm getting some results that I don't understand. I have 3 questions about the results Given: mydf<- data.frame(c(12,34,9,3,22,55),c(1,2,1,1,2,2) , c(0,1,2,1,1,2)) colnames(mydf)[1] <- 'n' colnames(mydf)[2] <- 'x' colnames(mydf)[3] <- 'x1' mydf looks like this: n x x1 1 12 1 0 2 34 2 1 3 9 1 2 4 3 1 1 5 22 2 1 6 55 2 2 Question #1 If I do: k <- function(x) { mydf$z <- ifelse(x == 1, 0, mydf$n) return (mydf) } mydf <- ddply(mydf, c("x") , .fun = k, .inform = TRUE) I get the following error: Error in `$<-.data

R: Grouped rolling window linear regression with rollapply and ddply

佐手、 提交于 2019-12-04 05:10:55
问题 I have a data set with several grouping variables on which I want to run a rolling window linear regression. The ultimate goals is to extract the 10 linear regressions with the lowest slopes and average them together to provide a mean minimum rate of change. I have found examples using rollapply to calculate rolling window linear regressions, but I have the added complication that I would like to apply these linear regressions to groups within the data set. Here is a sample data set and my

splitting text in column and add row number [duplicate]

ぐ巨炮叔叔 提交于 2019-12-04 05:07:51
问题 This question already has answers here : Split comma-separated strings in a column into separate rows (5 answers) Closed 2 years ago . I would like to split some text in a data frame column and save it into a data frame together with the row number or an id column. I normally used plyr to do that, but this is no longer working in dplyr. If I understand it correctly, it is more a bug in plyr and my code works since it is a bug. So I am looking for the correct way to do this. This is a minimal

Getting R Frequency counts for all possible answers

会有一股神秘感。 提交于 2019-12-04 04:33:07
问题 I've started with R and I'm still finding my way with syntax. I'm looking to get the frequencies for a scaled variable which has values of 0 through 10 and NA. Id <- c(1,2,3,4,5) ClassA <- c(1,NA,3,1,1) ClassB <- c(2,1,1,3,3) R <- c(5,5,7,NA,9) S <- c(3,7,NA,9,5) df <- data.frame(Id,ClassA,ClassB,R,S) library(plyr) count(df,'R') I get a result of R freq 1 5 2 2 7 1 3 9 1 4 NA 1 I'm looking for a result of R freq 1 0 0 2 1 0 3 2 0 4 3 0 5 4 0 6 5 2 7 6 0 8 7 1 9 8 0 10 9 1 11 10 0 12 NA 1 If I

Am I using plyr right? I seem to be using way too much memory

删除回忆录丶 提交于 2019-12-04 03:49:36
I have the following, somewhat large dataset: > dim(dset) [1] 422105 25 > class(dset) [1] "data.frame" > Without doing anything, the R process seems to take about 1GB of RAM. I am trying to run the following code: dset <- ddply(dset, .(tic), transform, date.min <- min(date), date.max <- max(date), daterange <- max(date) - min(date), .parallel = TRUE) Running that code, RAM usage skyrockets. It completely saturated 60GB's of RAM, running on a 32 core machine. What am I doing wrong? If performance is an issue, it might be a good idea to switch to using data.table s from the package of the same

Why doesn't the plyr package use my parallel backend?

ぐ巨炮叔叔 提交于 2019-12-04 03:48:27
I'm trying to use the parallel package in R for parallel operations rather than doSNOW since it's built-in and ostensibly the way the R Project wants things to go. I'm doing something wrong that I can't pin down though. Take for example this: a <- rnorm(50) b <- rnorm(50) arr <- matrix(cbind(a,b),nrow=50) aaply(arr,.margin=1,function(x){x[1]+x[2]},.parallel=F) This works just fine, producing the sums of my two columns. But if I try to bring in the parallel package: library(parallel) nodes <- detectCores() cl <- makeCluster(nodes) setDefaultCluster(cl) aaply(arr,.margin=1,function(x){x[1]+x[2]}

Loops to create new variables in ddply

血红的双手。 提交于 2019-12-04 02:25:46
I am using ddply to aggregate and summarize data frame variables, and I am interested in looping through my data frame's list to create the new variables. new.data <- ddply(old.data, c("factor", "factor2"), function(df) c(a11_a10 = CustomFunction(df$a11_a10), a12_a11 = CustomFunction(df$a12_a11), a13_a12 = CustomFunction(df$a13_a12), ... ... ...)) Is there a way for me to insert a loop in ddply so that I can avoid writing each new summary variable out, e.g. for (i in 11:n) { paste("a", i, "_a", i - 1) = CustomFunction(..... ) } I know that this is not how it would actually be done, but I just

subset inside a function by the variables specified in ddply

折月煮酒 提交于 2019-12-04 01:38:51
问题 Often I need to subset a data.frame inside a function by the variables that I am subsetting another data.frame to which I apply ddply. To do that I explicitly write again the variables inside the function and I wonder whether there is a more elegant way to do that. Below I include a trivial example just to show which is my current approach to do this. d1<-expand.grid(x=c('a','b'),y=c('c','d'),z=1:3) d2<-expand.grid(x=c('a','b'),y=c('c','d'),z=4:6) results<-ddply(d1,.(x,y),function(d) { d2Sub<

dplyr: apply function table() to each column of a data.frame

社会主义新天地 提交于 2019-12-04 01:23:46
Apply function table() to each column of a data.frame using dplyr I often apply the table-function on each column of a data frame using plyr , like this: library(plyr) ldply( mtcars, function(x) data.frame( table(x), prop.table( table(x) ) ) ) Is it possible to do this in dplyr also? My attempts fail: mtcars %>% do( table %>% data.frame() ) melt( mtcars ) %>% do( table %>% data.frame() ) Caner You can try the following which does not rely on the tidyr package. mtcars %>% lapply(table) %>% lapply(as.data.frame) %>% Map(cbind,var = names(mtcars),.) %>% rbind_all() %>% group_by(var) %>% mutate

How do I pass variables to a custom function in ddply?

好久不见. 提交于 2019-12-03 20:54:13
Consider the following data: d = data.frame( experiment = as.factor(c("foo", "foo", "foo", "bar", "bar")), si = runif(5), ti = runif(5) ) I would like to perform a correlation test for si and ti , for each experiment factor level. So I thought I'd run: ddply(d, .(experiment), cor.test) But how do I pass the values of si and ti to the cor.test call? I tried this: > ddply(d, .(experiment), cor.test, x = si, y = ti) Error in .fun(piece, ...) : object 'si' not found > ddply(d, .(experiment), cor.test, si, ti) Error in match.arg(alternative) : 'arg' must be NULL or a character vector Is there