plyr

ddply run in a function looks in the environment outside the function?

☆樱花仙子☆ 提交于 2019-12-11 01:37:39
问题 I'm trying to write a function to do some often repeated analysis, and one part of this is to count the number of groups and number of members within each group, so ddply to the rescue !, however, my code has a problem.... Here is some example data > dput(BGBottles) structure(list(Machine = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L), .Label = c("1", "2", "3", "4"), class = "factor"), weight = c(14.23, 14.96, 14.85, 16.46, 16.74, 15.94, 14.98, 14.88, 14.87, 15.94, 16.07, 14.91

using summarize in ddply to get entire row based on max() of one column

若如初见. 提交于 2019-12-10 23:31:31
问题 df1 primer timepoints mean sde Acan 0 1.0000000 0.000000e+00 Acan 20 0.8758265 7.856192e-02 Acan 40 1.0575400 4.680159e-02 Acan 60 1.2399106 2.238616e-01 Acan 120 1.1710685 2.085558e-02 Acan 240 1.6430670 NA Acan 360 1.7747940 NA all I want is the max value of mean (for any of these timepoints) w/ it's corresponding sde. ## this will only get me the mean obviously x <- ddply(x, .(primer), summarize, max = max(mean)) primer max Acan 1.774794 ## if I were to do this I would obviously not have

Fast crosstabs and stats on all pairs of variables

别说谁变了你拦得住时间么 提交于 2019-12-10 23:00:03
问题 I am trying to calculate a measure of association between all variables in a data.table . (This is not a stats question, but as an aside: the variables are all factors, and the measure is Cramér's V.) Example dataset: p = 50; n = 1e5; # actual dataset has p > 1e3, n > 1e5, much wider but barely longer set.seed(1234) obs <- as.data.table( data.frame( cbind( matrix(sample(c(LETTERS[1:4],NA), n*(p/2), replace=TRUE), nrow=n, ncol=p/2), matrix(sample(c(letters[1:6],NA), n*(p/2), replace=TRUE),

Error thrown within ddply crashes R

假装没事ソ 提交于 2019-12-10 20:33:01
问题 I'm running into an issue where plyr consistently crashes when an error is thrown from the supplied function > require(plyr) Loading required package: plyr Warning message: package ‘plyr’ was built under R version 3.0.2 > df <- data.frame(group=c("A","A","B","B"), num=c(11,22,33,44)) > ddply(df, .(group), function(x) {x}) group num 1 A 11 2 A 22 3 B 33 4 B 44 > ddply(df, .(group), function(x) {stop("badness")}) called from: (function () { .rs.breakOnError(TRUE) })() Error in .fun(piece, ...)

How to calculate average values large datasets

不想你离开。 提交于 2019-12-10 18:38:41
问题 I am working with a dataset that has temperature readings once an hour, 24 hrs a day for 100+ years. I want to get an average temperature for each day to reduce the size of my dataset. The headings look like this: YR MO DA HR MN TEMP 1943 6 19 10 0 73 1943 6 19 11 0 72 1943 6 19 12 0 76 1943 6 19 13 0 78 1943 6 19 14 0 81 1943 6 19 15 0 85 1943 6 19 16 0 85 1943 6 19 17 0 86 1943 6 19 18 0 86 1943 6 19 19 0 87 etc for 600,000+ data points. How can I run a nested function to calculate daily

Aggregating duplicate rows by taking sum

偶尔善良 提交于 2019-12-10 16:57:12
问题 Following on from my questions: 1. Identifying whether a set of variables uniquely identifies each row of the data or not; 2. Tagging all rows that are duplicates in terms of a given set of variables, I would now like to aggregate/consolidate all the duplicate rows in terms of a given set of variables, by taking their sum. Solution 1: There is some guidance on how to do this here, but when there are a large number of levels of the variables that form the index, the ddply method recommended

Add an index (or counter) to a dataframe by group in R [duplicate]

给你一囗甜甜゛ 提交于 2019-12-10 16:39:06
问题 This question already has answers here : Numbering rows within groups in a data frame (6 answers) Closed 3 years ago . I have a df like ProjectID Dist 1 x 1 y 2 z 2 x 2 h 3 k .... .... I want to add a third column such that we have an incrementing counter for each ProjectID: ProjectID Dist counter 1 x 1 1 y 2 2 z 1 2 x 2 2 h 3 1 k 3 .... .... I've had a look at seq rank and a couple of other bits particularly looking to see if I could use ddply to help: df$counter <- ddply(df,.(projectID),

Conditional Cross tabulation in R

99封情书 提交于 2019-12-10 15:56:13
问题 Looking for the quickest way to achieve below task using "expss" package. With a great package of "expss", we can easily do cross tabulation (which has other advantage and useful functions for cross-tabulations.), we can cross-tabulate multiple variables easily like below. #install.packages("expss") library("expss") data(mtcars) var1 <- "vs, am, gear, carb" var_names = trimws(unlist(strsplit(var1, split = ","))) mtcars %>% tab_prepend_values %>% tab_cols(total(), ..[(var_names)]) %>% tab

Add simulated poisson distributions to a ggplot

限于喜欢 提交于 2019-12-10 15:53:50
问题 I have made a poisson regression and then visualised the model: library(ggplot2) year <- 1990:2010 count <- c(29, 8, 13, 3, 20, 14, 18, 15, 10, 19, 17, 18, 24, 47, 52, 24, 25, 24, 31, 56, 48) df <- data.frame(year, count) my_glm <- glm(count ~ year, family = "poisson", data = df) my_glm$model$fitted <- predict(my_glm, type = "response") ggplot(my_glm$model) + geom_point(aes(year, count)) + geom_line(aes(year, fitted)) Now I want to add these simulated Poisson distributions to the plot:

Dummy for first new element in a series

送分小仙女□ 提交于 2019-12-10 14:59:00
问题 Suppose I have a variable that lasts for several periods. Like the amount of years that I have an Ipod. So I had the Ipod 1st generation from 2001 until 2004 and then in 2005 I've got Ipod 2 and so on. So my dataframe would look like: 2001 Ipod1 2002 Ipod1 2003 Ipod1 2004 Ipod1 2005 Ipod2 2006 Ipod2 2007 Ipod2 2008 Ipod2 2009 Ipod3 2010 Ipod3 What I want is to create a dummy for the period when a new variable arrives so I would get: Year Var Dummy 2001 Ipod1 1 2002 Ipod1 0 2003 Ipod1 0 2004