plyr | 易学教程

Using ddply inside a function

阅读更多关于 Using ddply inside a function

I'm trying to make a function using ddply inside of it. However I can't get to work. This is a dummy example reproducing what I get. Does this have anything to do this bug ? library(ggplot2) data(diamonds) foo <- function(data, fac1, fac2, bar) { res <- ddply(data, .(fac1, fac2), mean(bar)) res } foo(diamonds, "color", "cut", "price") I don't believe this is a bug. ddply expects the name of a function, which you haven't really supplied with mean(bar) . You need to write a complete function that calculates the mean you'd like: foo <- function(data, fac1, fac2, bar) { res <- ddply(data, c(fac1,

Find the minimum distance between two data frames, for each element in the second data frame

阅读更多关于 Find the minimum distance between two data frames, for each element in the second data frame

问题 I have two data frames ev1 and ev2, describing timestamps of two types of events collected over many tests. So, each data frame has columns "test_id", and "timestamp". What I need to find is the minimum distance of ev1 for each ev2, in the same test. I have a working code that merges the two datasets, calculates the distances, and then uses dplyr to filter for the minimum distance: ev1 = data.frame(test_id = c(0, 0, 0, 1, 1, 1), time=c(1, 2, 3, 2, 3, 4)) ev2 = data.frame(test_id = c(0, 0, 0,

Accessing grouped data in dplyr

阅读更多关于 Accessing grouped data in dplyr

How can I access the grouped data after applying group_by function from dplyr and using %.% operator For example, If I want to have the first row of each grouped data then I can do this using plyr package as ddply(iris,.(Species),function(df){ df[1,] }) #output # Sepal.Length Sepal.Width Petal.Length Petal.Width Species #1 5.1 3.5 1.4 0.2 setosa #2 7.0 3.2 4.7 1.4 versicolor #3 6.3 3.3 6.0 2.5 virginica For your specific case, you can use row_number() : library(dplyr) iris %.% group_by(Species) %.% filter(row_number(Species) == 1) ## Source: local data frame [3 x 5] ## Groups: Species ## ##

Transfer large MongoDB collections to data.frame in R with rmongodb and plyr

阅读更多关于 Transfer large MongoDB collections to data.frame in R with rmongodb and plyr

I have some strange results with huge collections sets when trying to transfer as data frames from MongoDB to R with rmongodb and plyr packages. I pick up this code from various github and forums on the subject, and adapt it for my purposes : ## load the both packages library(rmongodb) library(plyr) ## connect to MongoDB mongo <- mongo.create(host="localhost") # [1] TRUE ## get the list of the databases mongo.get.databases(mongo) # list of databases (with mydatabase) ## get the list of the collections of mydatabase mongo.get.collections(mongo, db = "mydatabase") # list of all the collections

Find the minimum distance between two data frames, for each element in the second data frame

阅读更多关于 Find the minimum distance between two data frames, for each element in the second data frame

I have two data frames ev1 and ev2, describing timestamps of two types of events collected over many tests. So, each data frame has columns "test_id", and "timestamp". What I need to find is the minimum distance of ev1 for each ev2, in the same test. I have a working code that merges the two datasets, calculates the distances, and then uses dplyr to filter for the minimum distance: ev1 = data.frame(test_id = c(0, 0, 0, 1, 1, 1), time=c(1, 2, 3, 2, 3, 4)) ev2 = data.frame(test_id = c(0, 0, 0, 1, 1, 1), time=c(6, 1, 8, 4, 5, 11)) data <- merge(ev2, ev1, by=c("test_id"), suffixes=c(".ev2", ".ev1"

data.table or dplyr - data manipulation

阅读更多关于 data.table or dplyr - data manipulation

I have the following data Date Col1 Col2 2014-01-01 123 12 2014-01-01 123 21 2014-01-01 124 32 2014-01-01 125 32 2014-01-02 123 34 2014-01-02 126 24 2014-01-02 127 23 2014-01-03 521 21 2014-01-03 123 13 2014-01-03 126 15 Now, I want to count unique values in Col1 for the each date (that did not repeat in previous date), and add to the previous count. For example, Date Count 2014-01-01 3 i.e. 123,124,125 2014-01-02 5 (2 + above 3) i.e. 126, 127 2014-01-03 6 (1 + above 5) i.e. 521 only lukeA library(dplyr) df %.% arrange(Date) %.% filter(!duplicated(Col1)) %.% group_by(Date) %.% summarise(Count

renaming the output column with the plyr package in R

阅读更多关于 renaming the output column with the plyr package in R

Hadley turned me on to the plyr package and I find myself using it all the time to do 'group by' sort of stuff. But I find myself having to always rename the resulting columns since they default to V1, V2, etc. Here's an example: mydata<-data.frame(matrix(rnorm(144, mean=2, sd=2),72,2),c(rep("A",24),rep("B",24),rep("C",24))) colnames(mydata) <- c("x_value", "acres", "state") groupAcres <- ddply(mydata, c("state"), function(df)c(sum(df$acres))) colnames(groupAcres) <- c("state","stateAcres") Is there a way to make ddply name the resulting column for me so I can omit that last line? This seems

group by and scale/normalize a column in r

阅读更多关于 group by and scale/normalize a column in r

问题 I have a dataframe that looks like this: Store Temperature Unemployment Sum_Sales 1 1 42.31 8.106 1643691 2 1 38.51 8.106 1641957 3 1 39.93 8.106 1611968 4 1 46.63 8.106 1409728 5 1 46.50 8.106 1554807 6 1 57.79 8.106 1439542 What I can't figure out in R is how to group by and apply. So for each store (grouped), I want to normalize/scale two columns (sum_sales and temperature). Desired output that I want is the following: Store Temperature Unemployment Sum_Sales 1 1 1.000 8.106 1.00000 2 1 0

R ggplot and facet grid: how to control x-axis breaks

阅读更多关于 R ggplot and facet grid: how to control x-axis breaks

I am trying to plot the change in a time series for each calendar year using ggplot and I am having problems with the fine control of the x-axis. If I do not use scale="free_x" then I end up with an x-axis that shows several years as well as the year in question, like this: If I do use scale="free_x" then as one would expect I end up with tick labels for each plot, and that in some cases vary by plot, which I do not want: I have made various attempts to define the x-axis using scale_x_date etc but without any success. My question is therefore: Q. How can I control the x-axis breaks and labels

split apply recombine, plyr, data.table in R

阅读更多关于 split apply recombine, plyr, data.table in R

I am doing the classic split-apply-recombine thing in R. My data set is a bunch of firms over time. The applying I am doing is running a regression for each firm and returning the residuals, therefore, I am not aggregating by firm. plyr is great for this but it takes a very very long time to run when the number of firms is large. Is there a way to do this with data.table ? Sample Data: dte, id, val1, val2 2001-10-02, 1, 10, 25 2001-10-03, 1, 11, 24 2001-10-04, 1, 12, 23 2001-10-02, 2, 13, 22 2001-10-03, 2, 14, 21 I need to split by each id (namely 1 and 2). Run a regression, return the