plyr

Summary data tables from wide data.frames

孤街浪徒 提交于 2019-12-01 01:12:57
I am trying to find lazy/easy ways of creating summary tables/ data.frames from wide data.frames . Assume a following data.frame, but with many more columns so that specifying the column names takes a long time: set.seed(2) x <- data.frame(Rep = rep(1:3, 4), Temp = c(rep(10,6), rep(20,6)), pH = rep(c(rep(8.1, 3), rep(7.6, 3)), 2), Var1 = rnorm(12, 5,2), Var2 = c(rnorm(6,4,1), rnorm(6,3,5)), Var3 = rt(12, 20)) x[1:3] <- as.data.frame(apply(x[1:3], 2, function(x) as.factor(x))) Now I can calculate summary statistics with plyr : (mu <- ddply(x, .(Temp, pH), numcolwise(mean))) (std <- ddply(x, .

ddply + summarise function column name input

折月煮酒 提交于 2019-11-30 23:15:39
I am trying to use ddply and summarise together from the plyr package but am having difficulty parsing through column names that keep changing...In my example i would like something that would parse in X1 programatically rather than hard coding in X1 into the ddply function. setting up an example require(xts) require(plyr) require(reshape2) require(lubridate) t <- xts(matrix(rnorm(10000),ncol=10), Sys.Date()-1000:1) t.df <- data.frame(coredata(t)) t.df <- cbind(day=wday(index(t), label=TRUE, abbr=TRUE), t.df) t.df.l <- melt(t.df, id.vars=c("day",colnames(t.df)[2]), measure.vars=colnames(t.df)

How do you summarize columns based on unique IDs without knowing IDs in R?

£可爱£侵袭症+ 提交于 2019-11-30 21:25:27
问题 I've been going through the posts regarding summarizing data, but haven't seem to have found what I'm looking for. I wish to create a summary "count-table" which will allow me to see how often a certain medication was given to patients. The fact that some patients received multiple medications simultaneously doesn't matter, because I simply want a summary of all the medication given and then calculate which percentage each medication class is of all medication given. The issue is, that I don

How to produce an R count matrix

▼魔方 西西 提交于 2019-11-30 21:04:12
问题 In R, I can return the count results using the specific column names I am interested in as an array as below. require("plyr") bevs <- data.frame(cbind(name = c("Bill", "Llib"), drink = c("coffee", "tea", "cocoa", "water"), cost = seq(1:8))) count(bevs, c("name", "drink")) # produces name drink freq 1 Bill cocoa 2 2 Bill coffee 2 3 Llib tea 2 4 Llib water 2 How can I get the count result of two specific column names in a matrix which has columns: all unique drinks, rows: all unique names and

How to get the name of a data.frame within a list?

天涯浪子 提交于 2019-11-30 20:32:11
How can I get a data frame's name from a list? Sure, get() gets the object itself, but I want to have its name for use within another function. Here's the use case, in case you would rather suggest a work around: lapply(somelistOfDataframes, function(X) { ddply(X, .(idx, bynameofX), summarise, checkSum = sum(value)) }) There is a column in each data frame that goes by the same name as the data frame within the list. How can I get this name bynameofX ? names(X) would return the whole vector. EDIT: Here's a reproducible example: df1 <- data.frame(value = rnorm(100), cat = c(rep(1,50), rep(2,50))

Calculate “group characteristics” without ddply and merge

时间秒杀一切 提交于 2019-11-30 20:12:21
I wonder whether there is a more straighforward way to calculate a certain type of variables than the approach i normally take.... The example below probably explains it best. I have a dataframe with 2 columns (fruit and whether the fruit is rotten or not). I would like to, for each row, add e.g. the percentage of fruit of the same category that is rotten. For example, there are 4 entries for apples, 2 of them are rotten, so each row for apple should read 0.5. The target values (purely as illustration) are included in the "desired outcome" column. I have previously approached this problem by *

ddply + summarise function column name input

穿精又带淫゛_ 提交于 2019-11-30 17:49:53
问题 I am trying to use ddply and summarise together from the plyr package but am having difficulty parsing through column names that keep changing...In my example i would like something that would parse in X1 programatically rather than hard coding in X1 into the ddply function. setting up an example require(xts) require(plyr) require(reshape2) require(lubridate) t <- xts(matrix(rnorm(10000),ncol=10), Sys.Date()-1000:1) t.df <- data.frame(coredata(t)) t.df <- cbind(day=wday(index(t), label=TRUE,

Calculate “group characteristics” without ddply and merge

≡放荡痞女 提交于 2019-11-30 16:57:24
问题 I wonder whether there is a more straighforward way to calculate a certain type of variables than the approach i normally take.... The example below probably explains it best. I have a dataframe with 2 columns (fruit and whether the fruit is rotten or not). I would like to, for each row, add e.g. the percentage of fruit of the same category that is rotten. For example, there are 4 entries for apples, 2 of them are rotten, so each row for apple should read 0.5. The target values (purely as

Using plyr::mapvalues with dplyr

人盡茶涼 提交于 2019-11-30 16:43:20
问题 plyr::mapvalues can be used like this: mapvalues(mtcars$cyl, c(4, 6, 8), c("a", "b", "c")) But this doesn't work: mtcars %>% dplyr::select(cyl) %>% mapvalues(c(4, 6, 8), c("a", "b", "c")) %>% as.data.frame() How can I use plyr::mapvalues with dplyr ? Or even better, what the dplyr equivalent? 回答1: To use it and return a one-column data.frame: mtcars %>% transmute(cyl = plyr::mapvalues(cyl, c(4, 6, 8), c("a", "b", "c"))) Or if you want a single vector output, like in your working example, use

Calculating hourly averages from a multi-year timeseries

╄→гoц情女王★ 提交于 2019-11-30 15:37:21
I have a dataset filled with the average windspeed per hour for multiple years. I would like to create an 'average year', in which for each hour the average windspeed for that hour over multiple years is calculated. How can I do this without looping endlessly through the dataset? Ideally, I would like to just loop through the data once, extracting for each row the right month, day, and hour, and adding the windspeed from that row to the right row in a dataframe where the aggregates for each month, day, and hour are gathered. Is it possible to do this without extracting the month, day, and hour