plyr

Summarize dataframe by day from timestamp

不打扰是莪最后的温柔 提交于 2019-11-29 17:05:18
I have a dataset data that contains a timestamp and a suite of other variables with values at each timestamp. I am trying to use ddply within plyr to create a new dataframe that is the summary (e.g. mean) of a variable by the group day. How can I get ddply to group by day? Or how can I can create a group or grouping variable from the day (%d) within the timestamp? The result dataframe would consist of the average values per day for each day present in data . library(plyr) data <- read.csv("data.csv", header=T) data$TIMESTAMP <- strptime(data$TIMESTAMP, "%m/%d/%Y %H:%M") ddply(data,.(DAY)

Summary of proportions by group

隐身守侯 提交于 2019-11-29 16:40:31
What would be the best tool/package to use to calculate proportions by subgroups? I thought I could try something like this: data(mtcars) library(plyr) ddply(mtcars, .(cyl), transform, Pct = gear/length(gear)) But the output is not what I want, as I would want something with a number of rows equal to cyl . Even if change it to summarise i still get the same problem. I am open to other packages, but I thought plyr would be best as I would eventually like to build a function around this. Any ideas? I'd appreciate any help just solving a basic problem like this. library(dplyr) mtcars %>% count

Population pyramid plot with ggplot2 and dplyr (instead of plyr)

╄→гoц情女王★ 提交于 2019-11-29 15:20:49
I am trying to reproduce the simple population pyramid from the post Simpler population pyramid in ggplot2 using ggplot2 and dplyr (instead of plyr ). Here is the original example with plyr and a seed set.seed(321) test <- data.frame(v=sample(1:20,1000,replace=T), g=c('M','F')) require(ggplot2) require(plyr) ggplot(data=test,aes(x=as.factor(v),fill=g)) + geom_bar(subset=.(g=="F")) + geom_bar(subset=.(g=="M"),aes(y=..count..*(-1))) + scale_y_continuous(breaks=seq(-40,40,10),labels=abs(seq(-40,40,10))) + coord_flip() Works fine. But how can I generate this same plot with dplyr instead? The

data.table or dplyr - data manipulation

久未见 提交于 2019-11-29 14:41:07
问题 I have the following data Date Col1 Col2 2014-01-01 123 12 2014-01-01 123 21 2014-01-01 124 32 2014-01-01 125 32 2014-01-02 123 34 2014-01-02 126 24 2014-01-02 127 23 2014-01-03 521 21 2014-01-03 123 13 2014-01-03 126 15 Now, I want to count unique values in Col1 for the each date (that did not repeat in previous date), and add to the previous count. For example, Date Count 2014-01-01 3 i.e. 123,124,125 2014-01-02 5 (2 + above 3) i.e. 126, 127 2014-01-03 6 (1 + above 5) i.e. 521 only 回答1:

R resetting a cumsum to zero at the start of each year

一笑奈何 提交于 2019-11-29 14:12:39
I have a dataframe with a bunch of donations data. I take the data and arrange it in time order from oldest to most recent gifts. Next I add a column containing a cumulative sum of the gifts over time. The data has multiple years of data and I was looking for a good way to reset the cumsum to 0 at the start of each year (the year starts and ends July 1st for fiscal purposes). This is how it currently is: id date giftamt cumsum() 005 01-05-2001 20.00 20.00 007 06-05-2001 25.00 45.00 009 12-05-2001 20.00 65.00 012 02-05-2002 30.00 95.00 015 08-05-2002 50.00 145.00 025 12-05-2002 25.00 170.00 ...

round_any equivalent for dplyr?

老子叫甜甜 提交于 2019-11-29 14:05:42
I am trying to make a switch to the "new" tidyverse ecosystem and try to avoid loading the old packages from Wickham et al. I used to rely my coding previously. I found round_any function from plyr useful in many cases where I needed custom rounding for plots, tables, etc. E.g. x <- c(1.1, 1.0, 0.99, 0.1, 0.01, 0.001) library(plyr) round_any(x, 0.1, floor) # [1] 1.1 1.0 0.9 0.1 0.0 0.0 Is there an equivalent for round_any function from plyr package in tidyverse ? ggplot::cut_width as pointed to in one of the comments, does not even return a numeric vector, but a factor instead. So it is no

Block bootstrap from subject list

一个人想着一个人 提交于 2019-11-29 13:16:30
I'm trying to efficiently implement a block bootstrap technique to get the distribution of regression coefficients. The main outline is as follows. I have a panel data set, and say firm and year are the indices. For each iteration of the bootstrap, I wish to sample n subjects with replacement. From this sample, I need to construct a new data frame that is an rbind() stack of all the observations for each sampled subject, run the regression, and pull out the coefficients. Repeat for a bunch of iterations, say 100. Each firm can potentially be selected multiple times, so I need to include it

Easiest way to subtract associated with one factor level from values associated with all other factor levels

假如想象 提交于 2019-11-29 12:47:30
I've got a dataframe containing rates for 'live' treatments and rates for 'killed' treatments. I'd like to subtract the killed treatments from the live ones: df <- data.frame(id1=gl(2, 3, labels=c("a", "b")), id2=rep(gl(3, 1, labels=c("live1", "live2", "killed")), 2), y=c(10, 10, 1, 12, 12, 2), otherFactor = gl(3, 2)) I'd like to subtract the values of y for which id2=="killed" from all the other values of y , separated by the levels of id1, while preserving otherFactor . I would end up with id1 id2 y otherFactor a live1 9 1 a live2 9 1 b live1 10 2 b live2 10 3 This almost works: df_minusKill

renaming the output column with the plyr package in R

前提是你 提交于 2019-11-29 12:44:49
问题 Hadley turned me on to the plyr package and I find myself using it all the time to do 'group by' sort of stuff. But I find myself having to always rename the resulting columns since they default to V1, V2, etc. Here's an example: mydata<-data.frame(matrix(rnorm(144, mean=2, sd=2),72,2),c(rep("A",24),rep("B",24),rep("C",24))) colnames(mydata) <- c("x_value", "acres", "state") groupAcres <- ddply(mydata, c("state"), function(df)c(sum(df$acres))) colnames(groupAcres) <- c("state","stateAcres")

loading dplyr after plyr is causing issues

女生的网名这么多〃 提交于 2019-11-29 12:35:25
Test Case: library(dplyr) library(plyr) library(dplyr) mtcars%>%rename(x=gear) This gives error. Any help would be greatly appreciated. Based on @hadley's tweet. Best answer is to load plyr ALWAYS before dplyr, AND not load plyr again. Pasting his tweet for reference. Hadley Wickham ‏@hadleywickham Jul 27 @gunapemmaraju just load plyr before dplyr? I have this problem when require plyr again sourcing files. You can do if("dplyr" %in% (.packages())){ detach("package:dplyr", unload=TRUE) detach("package:plyr", unload=TRUE) } library(plyr) library(dplyr) 来源: https://stackoverflow.com/questions