plyr | 易学教程

Summary data tables from wide data.frames

阅读更多关于 Summary data tables from wide data.frames

I am trying to find lazy/easy ways of creating summary tables/ data.frames from wide data.frames . Assume a following data.frame, but with many more columns so that specifying the column names takes a long time: set.seed(2) x <- data.frame(Rep = rep(1:3, 4), Temp = c(rep(10,6), rep(20,6)), pH = rep(c(rep(8.1, 3), rep(7.6, 3)), 2), Var1 = rnorm(12, 5,2), Var2 = c(rnorm(6,4,1), rnorm(6,3,5)), Var3 = rt(12, 20)) x[1:3] <- as.data.frame(apply(x[1:3], 2, function(x) as.factor(x))) Now I can calculate summary statistics with plyr : (mu <- ddply(x, .(Temp, pH), numcolwise(mean))) (std <- ddply(x, .

ddply + summarise function column name input

阅读更多关于 ddply + summarise function column name input

I am trying to use ddply and summarise together from the plyr package but am having difficulty parsing through column names that keep changing...In my example i would like something that would parse in X1 programatically rather than hard coding in X1 into the ddply function. setting up an example require(xts) require(plyr) require(reshape2) require(lubridate) t <- xts(matrix(rnorm(10000),ncol=10), Sys.Date()-1000:1) t.df <- data.frame(coredata(t)) t.df <- cbind(day=wday(index(t), label=TRUE, abbr=TRUE), t.df) t.df.l <- melt(t.df, id.vars=c("day",colnames(t.df)[2]), measure.vars=colnames(t.df)

How do you summarize columns based on unique IDs without knowing IDs in R?

阅读更多关于 How do you summarize columns based on unique IDs without knowing IDs in R?

问题 I've been going through the posts regarding summarizing data, but haven't seem to have found what I'm looking for. I wish to create a summary "count-table" which will allow me to see how often a certain medication was given to patients. The fact that some patients received multiple medications simultaneously doesn't matter, because I simply want a summary of all the medication given and then calculate which percentage each medication class is of all medication given. The issue is, that I don

How to produce an R count matrix

阅读更多关于 How to produce an R count matrix

问题 In R, I can return the count results using the specific column names I am interested in as an array as below. require("plyr") bevs <- data.frame(cbind(name = c("Bill", "Llib"), drink = c("coffee", "tea", "cocoa", "water"), cost = seq(1:8))) count(bevs, c("name", "drink")) # produces name drink freq 1 Bill cocoa 2 2 Bill coffee 2 3 Llib tea 2 4 Llib water 2 How can I get the count result of two specific column names in a matrix which has columns: all unique drinks, rows: all unique names and

How to get the name of a data.frame within a list?

阅读更多关于 How to get the name of a data.frame within a list?

How can I get a data frame's name from a list? Sure, get() gets the object itself, but I want to have its name for use within another function. Here's the use case, in case you would rather suggest a work around: lapply(somelistOfDataframes, function(X) { ddply(X, .(idx, bynameofX), summarise, checkSum = sum(value)) }) There is a column in each data frame that goes by the same name as the data frame within the list. How can I get this name bynameofX ? names(X) would return the whole vector. EDIT: Here's a reproducible example: df1 <- data.frame(value = rnorm(100), cat = c(rep(1,50), rep(2,50))

Calculate “group characteristics” without ddply and merge

阅读更多关于 Calculate “group characteristics” without ddply and merge

I wonder whether there is a more straighforward way to calculate a certain type of variables than the approach i normally take.... The example below probably explains it best. I have a dataframe with 2 columns (fruit and whether the fruit is rotten or not). I would like to, for each row, add e.g. the percentage of fruit of the same category that is rotten. For example, there are 4 entries for apples, 2 of them are rotten, so each row for apple should read 0.5. The target values (purely as illustration) are included in the "desired outcome" column. I have previously approached this problem by *

ddply + summarise function column name input

阅读更多关于 ddply + summarise function column name input

问题 I am trying to use ddply and summarise together from the plyr package but am having difficulty parsing through column names that keep changing...In my example i would like something that would parse in X1 programatically rather than hard coding in X1 into the ddply function. setting up an example require(xts) require(plyr) require(reshape2) require(lubridate) t <- xts(matrix(rnorm(10000),ncol=10), Sys.Date()-1000:1) t.df <- data.frame(coredata(t)) t.df <- cbind(day=wday(index(t), label=TRUE,

Calculate “group characteristics” without ddply and merge

阅读更多关于 Calculate “group characteristics” without ddply and merge

问题 I wonder whether there is a more straighforward way to calculate a certain type of variables than the approach i normally take.... The example below probably explains it best. I have a dataframe with 2 columns (fruit and whether the fruit is rotten or not). I would like to, for each row, add e.g. the percentage of fruit of the same category that is rotten. For example, there are 4 entries for apples, 2 of them are rotten, so each row for apple should read 0.5. The target values (purely as

Using plyr::mapvalues with dplyr

阅读更多关于 Using plyr::mapvalues with dplyr

问题 plyr::mapvalues can be used like this: mapvalues(mtcars$cyl, c(4, 6, 8), c("a", "b", "c")) But this doesn't work: mtcars %>% dplyr::select(cyl) %>% mapvalues(c(4, 6, 8), c("a", "b", "c")) %>% as.data.frame() How can I use plyr::mapvalues with dplyr ? Or even better, what the dplyr equivalent? 回答1: To use it and return a one-column data.frame: mtcars %>% transmute(cyl = plyr::mapvalues(cyl, c(4, 6, 8), c("a", "b", "c"))) Or if you want a single vector output, like in your working example, use

Calculating hourly averages from a multi-year timeseries

阅读更多关于 Calculating hourly averages from a multi-year timeseries

I have a dataset filled with the average windspeed per hour for multiple years. I would like to create an 'average year', in which for each hour the average windspeed for that hour over multiple years is calculated. How can I do this without looping endlessly through the dataset? Ideally, I would like to just loop through the data once, extracting for each row the right month, day, and hour, and adding the windspeed from that row to the right row in a dataframe where the aggregates for each month, day, and hour are gathered. Is it possible to do this without extracting the month, day, and hour