plyr

Calculate correlation by aggregating columns of data frame

邮差的信 提交于 2019-12-01 10:45:12
I have the following data frame: y <- data.frame(group = letters[1:5], a = rnorm(5) , b = rnorm(5), c = rnorm(5), d = rnorm(5) ) How to get a data frame which gives me the correlation between columns a,b and c,d for each row? something like: sapply(y, function(x) {cor(x[2:3],x[4:5])}) Thank you, S You could use apply > apply(y[,-1],1,function(x) cor(x[1:2],x[3:4])) [1] -1 -1 1 -1 1 Or ddply (although this might be overkill, and if two rows have the same group it will do the correlation of columns a&b and c&d for both those rows): > ddply(y,.(group),function(x) cor(c(x$a,x$b),c(x$c,x$d))) group

How to populate parameters values present in rows of one dataframe(df1) to dataframe(df2) under same parameter field in R

杀马特。学长 韩版系。学妹 提交于 2019-12-01 10:33:55
New to R, please guide ! Dataframe1 contain: df1 Col1 Col2 Col3 Col4 Col5 A=5 C=1 E=5 F=4 G=2 --Row1 A=6 B=3 D=6 E=4 F=4 --Row2 B=2 C=3 D=3 E=3 F=7 --Row3 Dataframe2 contain one row with each parameters as field names: df2 = A B C D E F g .....'n' Example Output (if values not found the null to be printed): df2: A B C D E F G 5 1 5 4 2 6 3 6 4 4 2 3 3 3 7 How to populate values of each parameter from df1 to df2 under same parameter which are present in first row as fields? Create a row number column ( rownames_to_column ), gather into 'long' format, separate the 'val' column into two (by

subset inside a function by the variables specified in ddply

我与影子孤独终老i 提交于 2019-12-01 08:44:51
Often I need to subset a data.frame inside a function by the variables that I am subsetting another data.frame to which I apply ddply. To do that I explicitly write again the variables inside the function and I wonder whether there is a more elegant way to do that. Below I include a trivial example just to show which is my current approach to do this. d1<-expand.grid(x=c('a','b'),y=c('c','d'),z=1:3) d2<-expand.grid(x=c('a','b'),y=c('c','d'),z=4:6) results<-ddply(d1,.(x,y),function(d) { d2Sub<-subset(d2,x==unique(d$x) & y==unique(d$y)) out<-d$z+d2Sub$z data.frame(out) }) The plyr package offers

How to create multiple ,csv files in R?

元气小坏坏 提交于 2019-12-01 08:14:24
问题 I have a .csv file with data for different chromosomes. The chromosomes names are stored in the first column(column name: Chr). My aim is to separate the data for each chromosome i.e. (Chr1,Chr2 etc) and make separate csv files for each. I cannot understand how to do this in limited steps. Thanks 回答1: Illustrating a one liner using plyr and the dataset iris plyr::d_ply(iris, .(Species), function(x) write.csv(x, file = paste(x$Species, ".csv", sep = ""))) 回答2: Read Data fn <- dir(pattern="csv"

Set column name ddply

て烟熏妆下的殇ゞ 提交于 2019-12-01 07:41:52
How to set the column name of the summarized data in library(plyr) ddply(data,.(col1,col2),nrow) like in ddply(data,.(col1,col2),function(x) data.frame(number=nrow(x))) Perhaps you are looking for summarize (or mutate or transform , depending on what you want to do). A small example: set.seed(1) data <- data.frame(col1 = c(1, 2, 2, 3, 3, 4), col2 = c(1, 2, 2, 1, 2, 1), z = rnorm(6)) ddply(data,.(col1,col2), summarize, number = length(z), newcol = mean(z)) # col1 col2 number newcol # 1 1 1 1 -0.6264538 # 2 2 2 2 -0.3259926 # 3 3 1 1 1.5952808 # 4 3 2 1 0.3295078 # 5 4 1 1 -0.8204684 来源: https:/

pass grouped dataframe to own function in dplyr

ぐ巨炮叔叔 提交于 2019-12-01 04:58:00
I am trying to transfer from plyr to dplyr. However, I still can't seem to figure out how to call on own functions in a chained dplyr function. I have a data frame with a factorised ID variable and an order variable. I want to split the frame by the ID, order it by the order variable and add a sequence in a new column. My plyr functions looks like this: f <- function(x) cbind(x[order(x$order_variable), ], Experience = 0:(nrow(x)-1)) data <- ddply(data, .(ID_variable), f) In dplyr I though this should look something like this f <- function(x) cbind(x[order(x$order_variable), ], Experience = 0:

Use object names within a list in lapply/ldply

你。 提交于 2019-12-01 04:08:57
In attempting to answer a question earlier, I ran into a problem that seemed like it should be simple, but I couldn't figure out. If I have a list of dataframes: df1 <- data.frame(a=1:3, x=rnorm(3)) df2 <- data.frame(a=1:3, x=rnorm(3)) df3 <- data.frame(a=1:3, x=rnorm(3)) df.list <- list(df1, df2, df3) That I want to rbind together, I can do the following: df.all <- ldply(df.list, rbind) However, I want another column that identifies which data.frame each row came from. I expected to be able to use the deparse(substitute(x)) method ( here and elsewhere) to get the name of the relevant data

multicore with plyr, MC

守給你的承諾、 提交于 2019-12-01 02:25:32
问题 Hi I am trying to use ddply in the plyr library in R, with the MC package. It doesn't seem to be speeding up the computation. This is the code I run: require(doMC) registerDoMC(4) getDoParWorkers() ##> 4 test <- data.frame(x=1:10000, y=rep(c(1:20), 500)) system.time(ddply(test, "y", mean)) # user system elapsed # 0.015 0.000 0.015 system.time(ddply(test, "y", mean, .parallel=TRUE)) # user system elapsed # 223.062 2.825 1.093 Any ideas? 回答1: The mean function operates too quickly relative to

How do you summarize columns based on unique IDs without knowing IDs in R?

一笑奈何 提交于 2019-12-01 02:01:00
I've been going through the posts regarding summarizing data, but haven't seem to have found what I'm looking for. I wish to create a summary "count-table" which will allow me to see how often a certain medication was given to patients. The fact that some patients received multiple medications simultaneously doesn't matter, because I simply want a summary of all the medication given and then calculate which percentage each medication class is of all medication given. The issue is, that I don't know the names of the possible medication given, they're "hidden" somewhere in the data.frame , thus,

Use object names within a list in lapply/ldply

时间秒杀一切 提交于 2019-12-01 01:54:45
问题 In attempting to answer a question earlier, I ran into a problem that seemed like it should be simple, but I couldn't figure out. If I have a list of dataframes: df1 <- data.frame(a=1:3, x=rnorm(3)) df2 <- data.frame(a=1:3, x=rnorm(3)) df3 <- data.frame(a=1:3, x=rnorm(3)) df.list <- list(df1, df2, df3) That I want to rbind together, I can do the following: df.all <- ldply(df.list, rbind) However, I want another column that identifies which data.frame each row came from. I expected to be able