plyr | 易学教程

R ddply, applying if and ifelse functions

阅读更多关于 R ddply, applying if and ifelse functions

I'm trying to apply a function to a dataframe using ddply from the plyr package, but I'm getting some results that I don't understand. I have 3 questions about the results Given: mydf<- data.frame(c(12,34,9,3,22,55),c(1,2,1,1,2,2) , c(0,1,2,1,1,2)) colnames(mydf)[1] <- 'n' colnames(mydf)[2] <- 'x' colnames(mydf)[3] <- 'x1' mydf looks like this: n x x1 1 12 1 0 2 34 2 1 3 9 1 2 4 3 1 1 5 22 2 1 6 55 2 2 Question #1 If I do: k <- function(x) { mydf$z <- ifelse(x == 1, 0, mydf$n) return (mydf) } mydf <- ddply(mydf, c("x") , .fun = k, .inform = TRUE) I get the following error: Error in `$<-.data

R: Grouped rolling window linear regression with rollapply and ddply

阅读更多关于 R: Grouped rolling window linear regression with rollapply and ddply

问题 I have a data set with several grouping variables on which I want to run a rolling window linear regression. The ultimate goals is to extract the 10 linear regressions with the lowest slopes and average them together to provide a mean minimum rate of change. I have found examples using rollapply to calculate rolling window linear regressions, but I have the added complication that I would like to apply these linear regressions to groups within the data set. Here is a sample data set and my

splitting text in column and add row number [duplicate]

阅读更多关于 splitting text in column and add row number [duplicate]

问题 This question already has answers here : Split comma-separated strings in a column into separate rows (5 answers) Closed 2 years ago . I would like to split some text in a data frame column and save it into a data frame together with the row number or an id column. I normally used plyr to do that, but this is no longer working in dplyr. If I understand it correctly, it is more a bug in plyr and my code works since it is a bug. So I am looking for the correct way to do this. This is a minimal

Getting R Frequency counts for all possible answers

阅读更多关于 Getting R Frequency counts for all possible answers

问题 I've started with R and I'm still finding my way with syntax. I'm looking to get the frequencies for a scaled variable which has values of 0 through 10 and NA. Id <- c(1,2,3,4,5) ClassA <- c(1,NA,3,1,1) ClassB <- c(2,1,1,3,3) R <- c(5,5,7,NA,9) S <- c(3,7,NA,9,5) df <- data.frame(Id,ClassA,ClassB,R,S) library(plyr) count(df,'R') I get a result of R freq 1 5 2 2 7 1 3 9 1 4 NA 1 I'm looking for a result of R freq 1 0 0 2 1 0 3 2 0 4 3 0 5 4 0 6 5 2 7 6 0 8 7 1 9 8 0 10 9 1 11 10 0 12 NA 1 If I

Am I using plyr right? I seem to be using way too much memory

阅读更多关于 Am I using plyr right? I seem to be using way too much memory

I have the following, somewhat large dataset: > dim(dset) [1] 422105 25 > class(dset) [1] "data.frame" > Without doing anything, the R process seems to take about 1GB of RAM. I am trying to run the following code: dset <- ddply(dset, .(tic), transform, date.min <- min(date), date.max <- max(date), daterange <- max(date) - min(date), .parallel = TRUE) Running that code, RAM usage skyrockets. It completely saturated 60GB's of RAM, running on a 32 core machine. What am I doing wrong? If performance is an issue, it might be a good idea to switch to using data.table s from the package of the same

Why doesn't the plyr package use my parallel backend?

阅读更多关于 Why doesn't the plyr package use my parallel backend?

I'm trying to use the parallel package in R for parallel operations rather than doSNOW since it's built-in and ostensibly the way the R Project wants things to go. I'm doing something wrong that I can't pin down though. Take for example this: a <- rnorm(50) b <- rnorm(50) arr <- matrix(cbind(a,b),nrow=50) aaply(arr,.margin=1,function(x){x[1]+x[2]},.parallel=F) This works just fine, producing the sums of my two columns. But if I try to bring in the parallel package: library(parallel) nodes <- detectCores() cl <- makeCluster(nodes) setDefaultCluster(cl) aaply(arr,.margin=1,function(x){x[1]+x[2]}

Loops to create new variables in ddply

阅读更多关于 Loops to create new variables in ddply

I am using ddply to aggregate and summarize data frame variables, and I am interested in looping through my data frame's list to create the new variables. new.data <- ddply(old.data, c("factor", "factor2"), function(df) c(a11_a10 = CustomFunction(df$a11_a10), a12_a11 = CustomFunction(df$a12_a11), a13_a12 = CustomFunction(df$a13_a12), ... ... ...)) Is there a way for me to insert a loop in ddply so that I can avoid writing each new summary variable out, e.g. for (i in 11:n) { paste("a", i, "_a", i - 1) = CustomFunction(..... ) } I know that this is not how it would actually be done, but I just

subset inside a function by the variables specified in ddply

阅读更多关于 subset inside a function by the variables specified in ddply

问题 Often I need to subset a data.frame inside a function by the variables that I am subsetting another data.frame to which I apply ddply. To do that I explicitly write again the variables inside the function and I wonder whether there is a more elegant way to do that. Below I include a trivial example just to show which is my current approach to do this. d1<-expand.grid(x=c('a','b'),y=c('c','d'),z=1:3) d2<-expand.grid(x=c('a','b'),y=c('c','d'),z=4:6) results<-ddply(d1,.(x,y),function(d) { d2Sub<

dplyr: apply function table() to each column of a data.frame

阅读更多关于 dplyr: apply function table() to each column of a data.frame

Apply function table() to each column of a data.frame using dplyr I often apply the table-function on each column of a data frame using plyr , like this: library(plyr) ldply( mtcars, function(x) data.frame( table(x), prop.table( table(x) ) ) ) Is it possible to do this in dplyr also? My attempts fail: mtcars %>% do( table %>% data.frame() ) melt( mtcars ) %>% do( table %>% data.frame() ) Caner You can try the following which does not rely on the tidyr package. mtcars %>% lapply(table) %>% lapply(as.data.frame) %>% Map(cbind,var = names(mtcars),.) %>% rbind_all() %>% group_by(var) %>% mutate

How do I pass variables to a custom function in ddply?

阅读更多关于 How do I pass variables to a custom function in ddply?

Consider the following data: d = data.frame( experiment = as.factor(c("foo", "foo", "foo", "bar", "bar")), si = runif(5), ti = runif(5) ) I would like to perform a correlation test for si and ti , for each experiment factor level. So I thought I'd run: ddply(d, .(experiment), cor.test) But how do I pass the values of si and ti to the cor.test call? I tried this: > ddply(d, .(experiment), cor.test, x = si, y = ti) Error in .fun(piece, ...) : object 'si' not found > ddply(d, .(experiment), cor.test, si, ti) Error in match.arg(alternative) : 'arg' must be NULL or a character vector Is there