plyr | 易学教程

how to operate with a subset of an R dataframe in long format?

阅读更多关于 how to operate with a subset of an R dataframe in long format?

I have a data frame with 3 groups and 3 days: set.seed(10) dat <- data.frame(group=rep(c("g1","g2","g3"),each=3), day=rep(c(0,2,4),3), value=runif(9)) # group day value # 1 g1 0 0.507478 # 2 g1 2 0.306769 # 3 g1 4 0.426908 # 4 g2 0 0.693102 # 5 g2 2 0.085136 # 6 g2 4 0.225437 # 7 g3 0 0.274531 # 8 g3 2 0.272305 # 9 g3 4 0.615829 I want to take the log2 and divide each value with the day 0 value within each group. The way I'm doing it now is by calculating each day group in an intermediate step: day_0 <- dat[dat$day==0, "value"] day_2 <- dat[dat$day==2, "value"] day_4 <- dat[dat$day==4, "value"

Passing a character vector as arguments to a function in plyr

阅读更多关于 Passing a character vector as arguments to a function in plyr

问题 I suspect I'm Doing It Wrong, but I'd like to pass a character vector as an argument to a function in ddply . There's a lot of Q&A on removing quotes, etc. but none of it seems to work for me (eg. Remove quotes from a character vector in R and http://r.789695.n4.nabble.com/Pass-character-vector-to-function-argument-td3045226.html). # reproducible data df1<-data.frame(a=sample(1:50,10),b=sample(1:50,10),c=sample(1:50,10),d=(c("a","b","c","a","a","b","b","a","c","d"))) df2<-data.frame(a=sample

Transpose duplicated rows to column in R

阅读更多关于 Transpose duplicated rows to column in R

I have a large data.frame (20000+ entries) in this format: id D1 D2 1 0.40 0.21 1 0.00 0.00 1 0.53 0.20 2 0.17 0.17 2 0.25 0.25 2 0.55 0.43 Where each id may be duplicated 3-20 times. I would like to merge the duplicated rows into new columns, so my new data.frame looks like: id D1 D2 D3 D4 D5 D6 1 0.40 0.21 0.00 0.00 0.53 0.20 2 0.17 0.17 0.25 0.25 0.55 0.43 I've manipulated data.frames before with plyr, but I'm not sure how to approach this problem. Any help would be appreciated.Thanks. The best option would be to just use melt and dcast from "reshape2". But before we jump to that option,

dplyr equivalent to ddply in plyr diamonds example

阅读更多关于 dplyr equivalent to ddply in plyr diamonds example

ok, I'm trying to wrap my head around dplyr, using it instead of plyr. In my short time with R I've grown somewhat accustomed to ddply. I'm using a "simple" example for how to use dplyr as opposed to ddply in plyr. Here goes: in the following: t1.table <- ddply(diamonds, c("clarity", "cut"), "nrow") I receive a summary table of counts of diamonds by clarity and cut. In dplyr, the simplest example I can come up with is: diamonds %>% select(clarity, cut) %>% group_by(clarity, cut) %>% summarise(count=n()) -> t2.table which seems a bit more involved. Is there a better way to simplify this? ~

reshaping data (a faster way)

阅读更多关于 reshaping data (a faster way)

I came across a table of freq. counts today I had to expand into a data frame of raw values. I was able to do it but was wondering if there's a faster way using the reshape package or data.table? The original table looked like this: i1 i2 i3 i4 m f 1 0 0 0 0 22 29 2 1 0 0 0 30 50 3 0 1 0 0 13 15 4 0 0 1 0 1 6 5 1 1 0 0 24 67 6 1 0 1 0 5 12 7 0 1 1 0 1 2 8 1 1 1 0 10 22 9 0 0 0 1 10 7 10 1 0 0 1 27 30 11 0 1 0 1 14 4 12 0 0 1 1 1 0 13 1 1 0 1 54 63 14 1 0 1 1 8 10 15 0 1 1 1 8 6 16 1 1 1 1 57 51 Here's an easy grab of the data using dput: dat <- structure(list(i1 = c(0L, 1L, 0L, 0L, 1L, 1L, 0L,

Parallel *ply within functions

阅读更多关于 Parallel *ply within functions

I want to use the parallel functionality of the plyr package within functions. I would have thought that the proper way to export objects that have been created within the body of the function (in this example, the object is df_2 ) is as follows # rm(list=ls()) library(plyr) library(doParallel) workers=makeCluster(2) registerDoParallel(workers,core=2) plyr_test=function() { df_1=data.frame(type=c("a","b"),x=1:2) df_2=data.frame(type=c("a","b"),x=3:4) #export df_2 via .paropts ddply(df_1,"type",.parallel=TRUE,.paropts=list(.export="df_2"),.fun=function(y) { merge(y,df_2,all=FALSE,by="type") })

lm called from inside dlply throws “0 (non-NA) cases” error [r]

阅读更多关于 lm called from inside dlply throws “0 (non-NA) cases” error [r]

I'm using dlply() with a custom function that averages slopes of lm() fits on data that contain some NA values, and I get the error "Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 0 (non-NA) cases" This error only happens when I call dlply with two key variables - separating by one variable works fine. Annoyingly I can't reproduce the error with a simple dataset, so I've posted the problem dataset in my dropbox. Here's the code, as minimized as possible while still producing an error: masterData <- read.csv("http://dl.dropbox.com/u/48901983/SOquestionData.csv", na

How do I pass variables to a custom function in ddply?

阅读更多关于 How do I pass variables to a custom function in ddply?

问题 Consider the following data: d = data.frame( experiment = as.factor(c("foo", "foo", "foo", "bar", "bar")), si = runif(5), ti = runif(5) ) I would like to perform a correlation test for si and ti , for each experiment factor level. So I thought I'd run: ddply(d, .(experiment), cor.test) But how do I pass the values of si and ti to the cor.test call? I tried this: > ddply(d, .(experiment), cor.test, x = si, y = ti) Error in .fun(piece, ...) : object 'si' not found > ddply(d, .(experiment), cor

difftime between rows using dplyr

阅读更多关于 difftime between rows using dplyr

问题 I'm trying to calculate the time difference between two timestamps in two adjacent rows using the dplyr package. Here's the code: tidy_ex <- function () { library(dplyr) #construct example data data <- data.frame(code = c(10888, 10888, 10888, 10888, 10888, 10888, 10889, 10889, 10889, 10889, 10889, 10889, 10890, 10890, 10890), station = c("F1", "F3", "F4", "F5", "L5", "L7", "F1", "F3", "F4", "L5", "L6", "L7", "F1", "F3", "F5"), timestamp = c(1365895151, 1365969188, 1366105495, 1367433149,

Subset a list - a plyr way?

阅读更多关于 Subset a list - a plyr way?

问题 I often have data that is grouped by one or more variables, with several registrations within each group. From the data frame, I wish to select groups according to various criteria. I commonly use a split-sapply-rbind approach, where I extract elements from a list using a logical vector. Here is a small example. I start with a data frame with one grouping variable ('group'), and I wish to select groups that have a maximum mass of less than 45: dd <- data.frame(group = rep(letters[1:3], each =