plyr

how to operate with a subset of an R dataframe in long format?

限于喜欢 提交于 2019-12-05 10:24:01
I have a data frame with 3 groups and 3 days: set.seed(10) dat <- data.frame(group=rep(c("g1","g2","g3"),each=3), day=rep(c(0,2,4),3), value=runif(9)) # group day value # 1 g1 0 0.507478 # 2 g1 2 0.306769 # 3 g1 4 0.426908 # 4 g2 0 0.693102 # 5 g2 2 0.085136 # 6 g2 4 0.225437 # 7 g3 0 0.274531 # 8 g3 2 0.272305 # 9 g3 4 0.615829 I want to take the log2 and divide each value with the day 0 value within each group. The way I'm doing it now is by calculating each day group in an intermediate step: day_0 <- dat[dat$day==0, "value"] day_2 <- dat[dat$day==2, "value"] day_4 <- dat[dat$day==4, "value"

Passing a character vector as arguments to a function in plyr

丶灬走出姿态 提交于 2019-12-05 10:01:42
问题 I suspect I'm Doing It Wrong, but I'd like to pass a character vector as an argument to a function in ddply . There's a lot of Q&A on removing quotes, etc. but none of it seems to work for me (eg. Remove quotes from a character vector in R and http://r.789695.n4.nabble.com/Pass-character-vector-to-function-argument-td3045226.html). # reproducible data df1<-data.frame(a=sample(1:50,10),b=sample(1:50,10),c=sample(1:50,10),d=(c("a","b","c","a","a","b","b","a","c","d"))) df2<-data.frame(a=sample

Transpose duplicated rows to column in R

三世轮回 提交于 2019-12-05 09:39:16
I have a large data.frame (20000+ entries) in this format: id D1 D2 1 0.40 0.21 1 0.00 0.00 1 0.53 0.20 2 0.17 0.17 2 0.25 0.25 2 0.55 0.43 Where each id may be duplicated 3-20 times. I would like to merge the duplicated rows into new columns, so my new data.frame looks like: id D1 D2 D3 D4 D5 D6 1 0.40 0.21 0.00 0.00 0.53 0.20 2 0.17 0.17 0.25 0.25 0.55 0.43 I've manipulated data.frames before with plyr, but I'm not sure how to approach this problem. Any help would be appreciated.Thanks. The best option would be to just use melt and dcast from "reshape2". But before we jump to that option,

dplyr equivalent to ddply in plyr diamonds example

≯℡__Kan透↙ 提交于 2019-12-05 09:35:06
ok, I'm trying to wrap my head around dplyr, using it instead of plyr. In my short time with R I've grown somewhat accustomed to ddply. I'm using a "simple" example for how to use dplyr as opposed to ddply in plyr. Here goes: in the following: t1.table <- ddply(diamonds, c("clarity", "cut"), "nrow") I receive a summary table of counts of diamonds by clarity and cut. In dplyr, the simplest example I can come up with is: diamonds %>% select(clarity, cut) %>% group_by(clarity, cut) %>% summarise(count=n()) -> t2.table which seems a bit more involved. Is there a better way to simplify this? ~

reshaping data (a faster way)

旧街凉风 提交于 2019-12-05 09:32:27
I came across a table of freq. counts today I had to expand into a data frame of raw values. I was able to do it but was wondering if there's a faster way using the reshape package or data.table? The original table looked like this: i1 i2 i3 i4 m f 1 0 0 0 0 22 29 2 1 0 0 0 30 50 3 0 1 0 0 13 15 4 0 0 1 0 1 6 5 1 1 0 0 24 67 6 1 0 1 0 5 12 7 0 1 1 0 1 2 8 1 1 1 0 10 22 9 0 0 0 1 10 7 10 1 0 0 1 27 30 11 0 1 0 1 14 4 12 0 0 1 1 1 0 13 1 1 0 1 54 63 14 1 0 1 1 8 10 15 0 1 1 1 8 6 16 1 1 1 1 57 51 Here's an easy grab of the data using dput: dat <- structure(list(i1 = c(0L, 1L, 0L, 0L, 1L, 1L, 0L,

Parallel *ply within functions

末鹿安然 提交于 2019-12-05 09:14:21
I want to use the parallel functionality of the plyr package within functions. I would have thought that the proper way to export objects that have been created within the body of the function (in this example, the object is df_2 ) is as follows # rm(list=ls()) library(plyr) library(doParallel) workers=makeCluster(2) registerDoParallel(workers,core=2) plyr_test=function() { df_1=data.frame(type=c("a","b"),x=1:2) df_2=data.frame(type=c("a","b"),x=3:4) #export df_2 via .paropts ddply(df_1,"type",.parallel=TRUE,.paropts=list(.export="df_2"),.fun=function(y) { merge(y,df_2,all=FALSE,by="type") })

lm called from inside dlply throws “0 (non-NA) cases” error [r]

淺唱寂寞╮ 提交于 2019-12-05 02:58:01
I'm using dlply() with a custom function that averages slopes of lm() fits on data that contain some NA values, and I get the error "Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 0 (non-NA) cases" This error only happens when I call dlply with two key variables - separating by one variable works fine. Annoyingly I can't reproduce the error with a simple dataset, so I've posted the problem dataset in my dropbox. Here's the code, as minimized as possible while still producing an error: masterData <- read.csv("http://dl.dropbox.com/u/48901983/SOquestionData.csv", na

How do I pass variables to a custom function in ddply?

狂风中的少年 提交于 2019-12-05 02:34:43
问题 Consider the following data: d = data.frame( experiment = as.factor(c("foo", "foo", "foo", "bar", "bar")), si = runif(5), ti = runif(5) ) I would like to perform a correlation test for si and ti , for each experiment factor level. So I thought I'd run: ddply(d, .(experiment), cor.test) But how do I pass the values of si and ti to the cor.test call? I tried this: > ddply(d, .(experiment), cor.test, x = si, y = ti) Error in .fun(piece, ...) : object 'si' not found > ddply(d, .(experiment), cor

difftime between rows using dplyr

只愿长相守 提交于 2019-12-05 02:04:01
问题 I'm trying to calculate the time difference between two timestamps in two adjacent rows using the dplyr package. Here's the code: tidy_ex <- function () { library(dplyr) #construct example data data <- data.frame(code = c(10888, 10888, 10888, 10888, 10888, 10888, 10889, 10889, 10889, 10889, 10889, 10889, 10890, 10890, 10890), station = c("F1", "F3", "F4", "F5", "L5", "L7", "F1", "F3", "F4", "L5", "L6", "L7", "F1", "F3", "F5"), timestamp = c(1365895151, 1365969188, 1366105495, 1367433149,

Subset a list - a plyr way?

柔情痞子 提交于 2019-12-04 23:45:43
问题 I often have data that is grouped by one or more variables, with several registrations within each group. From the data frame, I wish to select groups according to various criteria. I commonly use a split-sapply-rbind approach, where I extract elements from a list using a logical vector. Here is a small example. I start with a data frame with one grouping variable ('group'), and I wish to select groups that have a maximum mass of less than 45: dd <- data.frame(group = rep(letters[1:3], each =