plyr

Using ddply inside a function

戏子无情 提交于 2019-11-30 15:15:40
I'm trying to make a function using ddply inside of it. However I can't get to work. This is a dummy example reproducing what I get. Does this have anything to do this bug ? library(ggplot2) data(diamonds) foo <- function(data, fac1, fac2, bar) { res <- ddply(data, .(fac1, fac2), mean(bar)) res } foo(diamonds, "color", "cut", "price") I don't believe this is a bug. ddply expects the name of a function, which you haven't really supplied with mean(bar) . You need to write a complete function that calculates the mean you'd like: foo <- function(data, fac1, fac2, bar) { res <- ddply(data, c(fac1,

Find the minimum distance between two data frames, for each element in the second data frame

两盒软妹~` 提交于 2019-11-30 14:23:38
问题 I have two data frames ev1 and ev2, describing timestamps of two types of events collected over many tests. So, each data frame has columns "test_id", and "timestamp". What I need to find is the minimum distance of ev1 for each ev2, in the same test. I have a working code that merges the two datasets, calculates the distances, and then uses dplyr to filter for the minimum distance: ev1 = data.frame(test_id = c(0, 0, 0, 1, 1, 1), time=c(1, 2, 3, 2, 3, 4)) ev2 = data.frame(test_id = c(0, 0, 0,

Accessing grouped data in dplyr

自闭症网瘾萝莉.ら 提交于 2019-11-30 12:42:53
How can I access the grouped data after applying group_by function from dplyr and using %.% operator For example, If I want to have the first row of each grouped data then I can do this using plyr package as ddply(iris,.(Species),function(df){ df[1,] }) #output # Sepal.Length Sepal.Width Petal.Length Petal.Width Species #1 5.1 3.5 1.4 0.2 setosa #2 7.0 3.2 4.7 1.4 versicolor #3 6.3 3.3 6.0 2.5 virginica For your specific case, you can use row_number() : library(dplyr) iris %.% group_by(Species) %.% filter(row_number(Species) == 1) ## Source: local data frame [3 x 5] ## Groups: Species ## ##

Transfer large MongoDB collections to data.frame in R with rmongodb and plyr

狂风中的少年 提交于 2019-11-30 10:33:12
I have some strange results with huge collections sets when trying to transfer as data frames from MongoDB to R with rmongodb and plyr packages. I pick up this code from various github and forums on the subject, and adapt it for my purposes : ## load the both packages library(rmongodb) library(plyr) ## connect to MongoDB mongo <- mongo.create(host="localhost") # [1] TRUE ## get the list of the databases mongo.get.databases(mongo) # list of databases (with mydatabase) ## get the list of the collections of mydatabase mongo.get.collections(mongo, db = "mydatabase") # list of all the collections

Find the minimum distance between two data frames, for each element in the second data frame

我的梦境 提交于 2019-11-30 10:01:45
I have two data frames ev1 and ev2, describing timestamps of two types of events collected over many tests. So, each data frame has columns "test_id", and "timestamp". What I need to find is the minimum distance of ev1 for each ev2, in the same test. I have a working code that merges the two datasets, calculates the distances, and then uses dplyr to filter for the minimum distance: ev1 = data.frame(test_id = c(0, 0, 0, 1, 1, 1), time=c(1, 2, 3, 2, 3, 4)) ev2 = data.frame(test_id = c(0, 0, 0, 1, 1, 1), time=c(6, 1, 8, 4, 5, 11)) data <- merge(ev2, ev1, by=c("test_id"), suffixes=c(".ev2", ".ev1"

data.table or dplyr - data manipulation

会有一股神秘感。 提交于 2019-11-30 09:43:50
I have the following data Date Col1 Col2 2014-01-01 123 12 2014-01-01 123 21 2014-01-01 124 32 2014-01-01 125 32 2014-01-02 123 34 2014-01-02 126 24 2014-01-02 127 23 2014-01-03 521 21 2014-01-03 123 13 2014-01-03 126 15 Now, I want to count unique values in Col1 for the each date (that did not repeat in previous date), and add to the previous count. For example, Date Count 2014-01-01 3 i.e. 123,124,125 2014-01-02 5 (2 + above 3) i.e. 126, 127 2014-01-03 6 (1 + above 5) i.e. 521 only lukeA library(dplyr) df %.% arrange(Date) %.% filter(!duplicated(Col1)) %.% group_by(Date) %.% summarise(Count

renaming the output column with the plyr package in R

谁说胖子不能爱 提交于 2019-11-30 08:50:35
Hadley turned me on to the plyr package and I find myself using it all the time to do 'group by' sort of stuff. But I find myself having to always rename the resulting columns since they default to V1, V2, etc. Here's an example: mydata<-data.frame(matrix(rnorm(144, mean=2, sd=2),72,2),c(rep("A",24),rep("B",24),rep("C",24))) colnames(mydata) <- c("x_value", "acres", "state") groupAcres <- ddply(mydata, c("state"), function(df)c(sum(df$acres))) colnames(groupAcres) <- c("state","stateAcres") Is there a way to make ddply name the resulting column for me so I can omit that last line? This seems

group by and scale/normalize a column in r

笑着哭i 提交于 2019-11-30 08:16:17
问题 I have a dataframe that looks like this: Store Temperature Unemployment Sum_Sales 1 1 42.31 8.106 1643691 2 1 38.51 8.106 1641957 3 1 39.93 8.106 1611968 4 1 46.63 8.106 1409728 5 1 46.50 8.106 1554807 6 1 57.79 8.106 1439542 What I can't figure out in R is how to group by and apply. So for each store (grouped), I want to normalize/scale two columns (sum_sales and temperature). Desired output that I want is the following: Store Temperature Unemployment Sum_Sales 1 1 1.000 8.106 1.00000 2 1 0

R ggplot and facet grid: how to control x-axis breaks

末鹿安然 提交于 2019-11-30 06:48:48
I am trying to plot the change in a time series for each calendar year using ggplot and I am having problems with the fine control of the x-axis. If I do not use scale="free_x" then I end up with an x-axis that shows several years as well as the year in question, like this: If I do use scale="free_x" then as one would expect I end up with tick labels for each plot, and that in some cases vary by plot, which I do not want: I have made various attempts to define the x-axis using scale_x_date etc but without any success. My question is therefore: Q. How can I control the x-axis breaks and labels

split apply recombine, plyr, data.table in R

白昼怎懂夜的黑 提交于 2019-11-30 05:23:10
I am doing the classic split-apply-recombine thing in R. My data set is a bunch of firms over time. The applying I am doing is running a regression for each firm and returning the residuals, therefore, I am not aggregating by firm. plyr is great for this but it takes a very very long time to run when the number of firms is large. Is there a way to do this with data.table ? Sample Data: dte, id, val1, val2 2001-10-02, 1, 10, 25 2001-10-03, 1, 11, 24 2001-10-04, 1, 12, 23 2001-10-02, 2, 13, 22 2001-10-03, 2, 14, 21 I need to split by each id (namely 1 and 2). Run a regression, return the