plyr | 易学教程

Calculate correlation by aggregating columns of data frame

阅读更多关于 Calculate correlation by aggregating columns of data frame

I have the following data frame: y <- data.frame(group = letters[1:5], a = rnorm(5) , b = rnorm(5), c = rnorm(5), d = rnorm(5) ) How to get a data frame which gives me the correlation between columns a,b and c,d for each row? something like: sapply(y, function(x) {cor(x[2:3],x[4:5])}) Thank you, S You could use apply > apply(y[,-1],1,function(x) cor(x[1:2],x[3:4])) [1] -1 -1 1 -1 1 Or ddply (although this might be overkill, and if two rows have the same group it will do the correlation of columns a&b and c&d for both those rows): > ddply(y,.(group),function(x) cor(c(x$a,x$b),c(x$c,x$d))) group

How to populate parameters values present in rows of one dataframe(df1) to dataframe(df2) under same parameter field in R

阅读更多关于 How to populate parameters values present in rows of one dataframe(df1) to dataframe(df2) under same parameter field in R

New to R, please guide ! Dataframe1 contain: df1 Col1 Col2 Col3 Col4 Col5 A=5 C=1 E=5 F=4 G=2 --Row1 A=6 B=3 D=6 E=4 F=4 --Row2 B=2 C=3 D=3 E=3 F=7 --Row3 Dataframe2 contain one row with each parameters as field names: df2 = A B C D E F g .....'n' Example Output (if values not found the null to be printed): df2: A B C D E F G 5 1 5 4 2 6 3 6 4 4 2 3 3 3 7 How to populate values of each parameter from df1 to df2 under same parameter which are present in first row as fields? Create a row number column ( rownames_to_column ), gather into 'long' format, separate the 'val' column into two (by

subset inside a function by the variables specified in ddply

阅读更多关于 subset inside a function by the variables specified in ddply

Often I need to subset a data.frame inside a function by the variables that I am subsetting another data.frame to which I apply ddply. To do that I explicitly write again the variables inside the function and I wonder whether there is a more elegant way to do that. Below I include a trivial example just to show which is my current approach to do this. d1<-expand.grid(x=c('a','b'),y=c('c','d'),z=1:3) d2<-expand.grid(x=c('a','b'),y=c('c','d'),z=4:6) results<-ddply(d1,.(x,y),function(d) { d2Sub<-subset(d2,x==unique(d$x) & y==unique(d$y)) out<-d$z+d2Sub$z data.frame(out) }) The plyr package offers

How to create multiple ,csv files in R?

阅读更多关于 How to create multiple ,csv files in R?

问题 I have a .csv file with data for different chromosomes. The chromosomes names are stored in the first column(column name: Chr). My aim is to separate the data for each chromosome i.e. (Chr1,Chr2 etc) and make separate csv files for each. I cannot understand how to do this in limited steps. Thanks 回答1: Illustrating a one liner using plyr and the dataset iris plyr::d_ply(iris, .(Species), function(x) write.csv(x, file = paste(x$Species, ".csv", sep = ""))) 回答2: Read Data fn <- dir(pattern="csv"

Set column name ddply

阅读更多关于 Set column name ddply

How to set the column name of the summarized data in library(plyr) ddply(data,.(col1,col2),nrow) like in ddply(data,.(col1,col2),function(x) data.frame(number=nrow(x))) Perhaps you are looking for summarize (or mutate or transform , depending on what you want to do). A small example: set.seed(1) data <- data.frame(col1 = c(1, 2, 2, 3, 3, 4), col2 = c(1, 2, 2, 1, 2, 1), z = rnorm(6)) ddply(data,.(col1,col2), summarize, number = length(z), newcol = mean(z)) # col1 col2 number newcol # 1 1 1 1 -0.6264538 # 2 2 2 2 -0.3259926 # 3 3 1 1 1.5952808 # 4 3 2 1 0.3295078 # 5 4 1 1 -0.8204684 来源： https:/

pass grouped dataframe to own function in dplyr

阅读更多关于 pass grouped dataframe to own function in dplyr

I am trying to transfer from plyr to dplyr. However, I still can't seem to figure out how to call on own functions in a chained dplyr function. I have a data frame with a factorised ID variable and an order variable. I want to split the frame by the ID, order it by the order variable and add a sequence in a new column. My plyr functions looks like this: f <- function(x) cbind(x[order(x$order_variable), ], Experience = 0:(nrow(x)-1)) data <- ddply(data, .(ID_variable), f) In dplyr I though this should look something like this f <- function(x) cbind(x[order(x$order_variable), ], Experience = 0:

Use object names within a list in lapply/ldply

阅读更多关于 Use object names within a list in lapply/ldply

In attempting to answer a question earlier, I ran into a problem that seemed like it should be simple, but I couldn't figure out. If I have a list of dataframes: df1 <- data.frame(a=1:3, x=rnorm(3)) df2 <- data.frame(a=1:3, x=rnorm(3)) df3 <- data.frame(a=1:3, x=rnorm(3)) df.list <- list(df1, df2, df3) That I want to rbind together, I can do the following: df.all <- ldply(df.list, rbind) However, I want another column that identifies which data.frame each row came from. I expected to be able to use the deparse(substitute(x)) method ( here and elsewhere) to get the name of the relevant data

multicore with plyr, MC

阅读更多关于 multicore with plyr, MC

问题 Hi I am trying to use ddply in the plyr library in R, with the MC package. It doesn't seem to be speeding up the computation. This is the code I run: require(doMC) registerDoMC(4) getDoParWorkers() ##> 4 test <- data.frame(x=1:10000, y=rep(c(1:20), 500)) system.time(ddply(test, "y", mean)) # user system elapsed # 0.015 0.000 0.015 system.time(ddply(test, "y", mean, .parallel=TRUE)) # user system elapsed # 223.062 2.825 1.093 Any ideas? 回答1: The mean function operates too quickly relative to

How do you summarize columns based on unique IDs without knowing IDs in R?

阅读更多关于 How do you summarize columns based on unique IDs without knowing IDs in R?

I've been going through the posts regarding summarizing data, but haven't seem to have found what I'm looking for. I wish to create a summary "count-table" which will allow me to see how often a certain medication was given to patients. The fact that some patients received multiple medications simultaneously doesn't matter, because I simply want a summary of all the medication given and then calculate which percentage each medication class is of all medication given. The issue is, that I don't know the names of the possible medication given, they're "hidden" somewhere in the data.frame , thus,

Use object names within a list in lapply/ldply

阅读更多关于 Use object names within a list in lapply/ldply

问题 In attempting to answer a question earlier, I ran into a problem that seemed like it should be simple, but I couldn't figure out. If I have a list of dataframes: df1 <- data.frame(a=1:3, x=rnorm(3)) df2 <- data.frame(a=1:3, x=rnorm(3)) df3 <- data.frame(a=1:3, x=rnorm(3)) df.list <- list(df1, df2, df3) That I want to rbind together, I can do the following: df.all <- ldply(df.list, rbind) However, I want another column that identifies which data.frame each row came from. I expected to be able