plyr

Grouping on multiple variables in R

旧巷老猫 提交于 2019-12-21 17:52:11
问题 I'm a power excel pivot table user who is forcing himself to learn R. I know exactly how to do this analysis in excel, but can't figure out the right way to code this in R. I'm trying to group user data by 2 different variables, while grouping the variables into ranges (or bins), then summarizing other variables. Here is what the data looks like: userid visits posts revenue 1 25 0 25 2 2 2 0 3 86 7 8 4 128 24 94 5 30 5 18 … … … … 280000 80 10 100 280001 42 4 25 280002 31 8 17 Here is what I

Producing a rolling average of ALL the previous observations per ID in an unbalanced panel data set

南楼画角 提交于 2019-12-21 17:00:01
问题 I am trying to compute rolling means of an unbalanced data set. To illustrate my point I have produced this toy example of my data: ID year Var RollingAvg(Var) 1 2000 2 NA 1 2001 3 2 1 2002 4 2.5 1 2003 2 3 2 2001 2 NA 2 2002 5 2 2 2003 4 3.5 The column RollingAvg(Var) is what I want, but can't get. In words, I am looking for the rolling average of ALL the previous observations of Var for each ID . I have tried using rollapply and ddply in the zoo and the plyr package, but I can't see how to

Am I using plyr right? I seem to be using way too much memory

浪子不回头ぞ 提交于 2019-12-21 09:27:47
问题 I have the following, somewhat large dataset: > dim(dset) [1] 422105 25 > class(dset) [1] "data.frame" > Without doing anything, the R process seems to take about 1GB of RAM. I am trying to run the following code: dset <- ddply(dset, .(tic), transform, date.min <- min(date), date.max <- max(date), daterange <- max(date) - min(date), .parallel = TRUE) Running that code, RAM usage skyrockets. It completely saturated 60GB's of RAM, running on a 32 core machine. What am I doing wrong? 回答1: If

Summary statistics using ddply

杀马特。学长 韩版系。学妹 提交于 2019-12-21 04:58:42
问题 I like to write a function using ddply that outputs the summary statistics based on the name of two columns of data.frame mat . mat is a big data.frame with the name of columns "metric", "length", "species", "tree", ...,"index" index is factor with 2 levels "Short", "Long" "metric", "length", "species", "tree" and others are all continuous variables Function: summary1 <- function(arg1,arg2) { ... ss <- ddply(mat, .(index), function(X) data.frame( arg1 = as.list(summary(X$arg1)), arg2 = as

ddply to multiple columns equivalent in data.table

心不动则不痛 提交于 2019-12-21 04:31:37
问题 I am a big fan of the data.table package and I am having trouble converting some code in ddply of the plyr package into the equivalent in a data.table. The code for ddply is: dfx <- data.frame( group = c(rep('A', 8), rep('B', 15), rep('C', 6)), sex = sample(c("M", "F"), size = 29, replace = TRUE), age = runif(n = 29, min = 18, max = 54), age2 = runif(n = 29, min = 18, max = 54) ) ddply(dfx, .(group, sex), numcolwise(sum)) What I want to do is sum across multiple columns without having to

cumsum using ddply

六月ゝ 毕业季﹏ 提交于 2019-12-21 03:42:38
问题 I need to use group by in levels with ddply or aggregate if that's easier. I am not really sure how to do this as I need to use cumsum as my aggregate function. This is what my data looks like: level1 level2 hour product A tea 0 7 A tea 1 2 A tea 2 9 A coffee 17 7 A coffee 18 2 A coffee 20 4 B coffee 0 2 B coffee 1 3 B coffee 2 4 B tea 21 3 B tea 22 1 expected output: A tea 0 7 A tea 1 9 A tea 2 18 A coffee 17 7 A coffee 18 9 A coffee 20 13 B coffee 0 2 B coffee 1 5 B coffee 2 9 B tea 21 3 B

Using dplyr for exploratory plots

一个人想着一个人 提交于 2019-12-21 02:40:48
问题 I regularly used d_ply to produce exploratory plots. A trivial example: require(plyr) plot_species <- function(species_data){ p <- qplot(data=species_data, x=Sepal.Length, y=Sepal.Width) print(p) } d_ply(.data=iris, .variables="Species", function(x)plot_species(x)) Which produces three separate plots, one for each species. I would like to reproduce this behaviour using functions in dplyr. This seems to require the reassembly of the data.frame within the function called by summarise, which is

Combine frequency tables into a single data frame

和自甴很熟 提交于 2019-12-20 10:48:35
问题 I have a list in which each list item is a word frequency table derived from using "table()" on a different sample text. Each table is, therefore, a different length. I want to now convert the list into a single data frame in which each column is a word each row is a sample text. Here is a dummy example of my data: t1<-table(strsplit(tolower("this is a test in the event of a real word file you would see many more words here"), "\\W")) t2<-table(strsplit(tolower("Four score and seven years ago

Error when calculating values greater than 95% quantile using plyr

两盒软妹~` 提交于 2019-12-20 06:39:16
问题 My data is structured as follows: Individ <- data.frame(Participant = c("Bill", "Bill", "Bill", "Bill", "Bill", "Bill", "Bill", "Bill", "Bill", "Bill", "Bill", "Bill", "Harry", "Harry", "Harry", "Harry","Harry", "Harry", "Harry", "Harry", "Paul", "Paul", "Paul", "Paul"), Time = c(1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4), Condition = c("Placebo", "Placebo", "Placebo", "Placebo", "Expr", "Expr", "Expr", "Expr", "Expr", "Expr", "Expr", "Expr", "Placebo", "Placebo",

Interpolate variables on subsets of dataframe

我的梦境 提交于 2019-12-20 05:47:11
问题 I have a large dataframe which has observations from surveys from multiple states for several years. Here's the data structure: state | survey.year | time1 | obs1 | time2 | obs2 CA | 2000 | 1 | 23 | 1.2 | 43 CA | 2001 | 2 | 43 | 1.4 | 52 CA | 2002 | 5 | 53 | 3.2 | 61 ... CA | 1998 | 3 | 12 | 2.3 | 20 CA | 1999 | 4 | 14 | 2.8 | 25 CA | 2003 | 5 | 19 | 4.3 | 29 ... ND | 2000 | 2 | 223 | 3.2 | 239 ND | 2001 | 4 | 233 | 4.2 | 321 ND | 2003 | 7 | 256 | 7.9 | 387 For each state/survey.year