plyr

understanding ddply error message

跟風遠走 提交于 2019-12-04 20:27:10
问题 I am trying to figure out why I am getting an error message when using ddply. Example data: data<-data.frame(area=rep(c("VA","OC","ES"),each=4), sex=rep(c("Male","Female"),each=2,times=3), year=rep(c(2009,2010),times=6), bin=c(110,120,125,125,110,130,125,80,90,90,80,140), shell_length=c(.4,4,1,2,.2,5,.4,4,.8,4,.3,4)) bin7<-ddply(data, .(area,year,sex,bin), summarize,n_bin=length(shell_length)) Error message: Error in .fun(piece, ...) : argument "by" is missing, with no default I got this

Merging files (and file names) in R

假装没事ソ 提交于 2019-12-04 20:19:49
I'm trying to merge a directory full of comma delimited text files using R, while also incorporating the file name of each file as a new variable in the data set. I've been using the following: library(plyr) file_list <- list.files() dataset <- ldply(file_list, read.table, header=FALSE, sep=",") Can anyone shed any light on how I'd add the file name for each file read as a new variable within dataset? Many thanks, -Jon You can just make a wrapper around the read.table() function that adds in your filename variable. Something like this should work: read.data <- function(file){ dat <- read.table

Function “diff” over various groups in R

旧街凉风 提交于 2019-12-04 19:32:46
i have a data frame with 2 groups 1 timevariable and an dependent variable. e.g.: name <- c("a", "a", "a", "a", "a", "a","a", "a", "a", "b", "b", "b","b", "b", "b","b", "b", "b") class <- c("c1", "c1", "c1", "c2", "c2", "c2", "c3", "c3", "c3","c1", "c1", "c1", "c2", "c2", "c2", "c3", "c3", "c3") year <- c("2010", "2009", "2008", "2010", "2009", "2008", "2010", "2009", "2008", "2010", "2009", "2008", "2010", "2009", "2008", "2010", "2009", "2008") value <- c(100, 33, 80, 90, 80, 100, 100, 90, 80, 90, 80, 100, 100, 90, 80, 99, 80, 100) df <- data.frame(name, class, year, value) df and would like

Make regressions and predictions for groups in R

半腔热情 提交于 2019-12-04 19:14:14
I have the following data.frame d from an experiment: - Variable y (response, continuous) - Factor f (500 levels) - Time t (posixct) In the last 8 years, y was measured roughly once a month (exact date in t) for each level of f. Sometimes there are 2 measures per month, sometimes a couple of month passed without any measures. Sorry for not providing example data, but making up unregular time series goes beyond my R knowledge. ;) I'd like to do the following with this data: make a regression using the loess() function (y ~ t) , for each level of f make a prediction of y for the first day of

Using Dates with the data.table package

寵の児 提交于 2019-12-04 18:35:47
问题 I recently discovered the data.table package and was now wondering whether or not I should replace some of my plyr-code. To summarize, I really like plyr and I basically achieved everything I wanted. However, my code runs a while and the outlook of speeding things up was enough for me to run some tests. Those tests ended quite soon and here is the reason. What I do quite often with plyr is to split my data by a column containing dates and do some calculations: library(plyr) DF <- data.frame

l_ply: how to pass the list's name attribute into the function?

不打扰是莪最后的温柔 提交于 2019-12-04 17:15:58
问题 Say I have an R list like this: > summary(data.list) Length Class Mode aug9104AP 18 data.frame list Aug17-10_acon_7pt_dil_series_01 18 data.frame list Aug17-10_Picro_7pt_dil_series_01 18 data.frame list Aug17-10_PTZ_7pt_dil_series_01 18 data.frame list Aug17-10_Verat_7pt_dil_series_01 18 data.frame list I want to process each data.frame in the list using l_ply , but I also need the name (e.g. aug9104AP) to be passed into the processing function along with the data.frame. Something like: l_ply

How can I overlay two dense scatter plots so that I can see the outlines of each in R or Matlab?

人盡茶涼 提交于 2019-12-04 17:02:21
问题 See this example This was created in matlab by making two scatter plots independently, creating images of each, then using the imagesc to draw them into the same figure and then finally setting the alpha of the top image to 0.5. I would like to do this in R or matlab without using images, since creating an image does not preserve the axis scale information, nor can I overlay a grid (e.g. using 'grid on' in matlab). Ideally I wold like to do this properly in matlab, but would also be happy

Need faster rolling apply function with start to stop indices

坚强是说给别人听的谎言 提交于 2019-12-04 14:55:36
Below is the piece of code. It gives percentile of the trade price level for rolling 15-minute(historical) window. It runs quickly if the length is 500 or 1000, but as you can see there are 45K observations, and for the entire data its very slow. Can I apply any of the plyr functions? Any other suggestions are welcome. This is how trade data looks like: > str(trade) 'data.frame': 45571 obs. of 5 variables: $ time : chr "2013-10-20 22:00:00.489" "2013-10-20 22:00:00.807" "2013-10-20 22:00:00.811" "2013-10-20 22:00:00.811" ... $ prc : num 121 121 121 121 121 ... $ siz : int 1 4 1 2 3 3 2 2 3 4 .

Tabulate responses for multiple columns by grouping variable with dplyr

跟風遠走 提交于 2019-12-04 14:44:16
Hi:I'm new to the plyr/dplyr family but enjoying it. I can see it's massive utility for my own work, but I'm stil trying to get my head around it. I have a data frame that looks like below. 1) How do I produce a table for each non-grouping variable that shows the distribution of responses within each value of the grouping variable? 2) Note: I do have some missing values and I would like to exclude them from the tabulation. I realize the summarize_each command will apply the function to each column, but I don't know how to handle the missing values issue in a simple way. I have seen some codes

merging endpoints of a range with a sequence

若如初见. 提交于 2019-12-04 14:12:24
问题 In one of my application there is a piece of code that retrieve information from a data.table object depending on values in another. # say this table contains customers details dt <- data.table(id=LETTERS[1:4], start=seq(as.Date("2010-01-01"), as.Date("2010-04-01"), "month"), end=seq(as.Date("2010-01-01"), as.Date("2010-04-01"), "month") + c(6,8,10,5), key="id") # this one has some historical details dt1 <- data.table(id=rep(LETTERS[1:4], each=120), date=seq(as.Date("2010-01-01"), as.Date(