plyr | 易学教程

understanding ddply error message

阅读更多关于 understanding ddply error message

I am trying to figure out why I am getting an error message when using ddply. Example data: data<-data.frame(area=rep(c("VA","OC","ES"),each=4), sex=rep(c("Male","Female"),each=2,times=3), year=rep(c(2009,2010),times=6), bin=c(110,120,125,125,110,130,125,80,90,90,80,140), shell_length=c(.4,4,1,2,.2,5,.4,4,.8,4,.3,4)) bin7<-ddply(data, .(area,year,sex,bin), summarize,n_bin=length(shell_length)) Error message: Error in .fun(piece, ...) : argument "by" is missing, with no default I got this error message yesterday. I restarted R and reran the code and everything was fine. This morning I got the

merging endpoints of a range with a sequence

阅读更多关于 merging endpoints of a range with a sequence

In one of my application there is a piece of code that retrieve information from a data.table object depending on values in another. # say this table contains customers details dt <- data.table(id=LETTERS[1:4], start=seq(as.Date("2010-01-01"), as.Date("2010-04-01"), "month"), end=seq(as.Date("2010-01-01"), as.Date("2010-04-01"), "month") + c(6,8,10,5), key="id") # this one has some historical details dt1 <- data.table(id=rep(LETTERS[1:4], each=120), date=seq(as.Date("2010-01-01"), as.Date("2010-04-30"), "day"), var=rnorm(120), key="id,date") # and here I finally retrieve my historical

Using dplyr for exploratory plots

阅读更多关于 Using dplyr for exploratory plots

I regularly used d_ply to produce exploratory plots. A trivial example: require(plyr) plot_species <- function(species_data){ p <- qplot(data=species_data, x=Sepal.Length, y=Sepal.Width) print(p) } d_ply(.data=iris, .variables="Species", function(x)plot_species(x)) Which produces three separate plots, one for each species. I would like to reproduce this behaviour using functions in dplyr. This seems to require the reassembly of the data.frame within the function called by summarise, which is often impractical. require(dplyr) iris_by_species <- group_by(iris,Species) plot_species <- function

R Plyr - Ordering results from DDPLY?

阅读更多关于 R Plyr - Ordering results from DDPLY?

Does anyone know a slick way to order the results coming out of a ddply summarise operation? This is what I'm doing to get the output ordered by descending depth. ddims <- ddply(diamonds, .(color), summarise, depth = mean(depth), table = mean(table)) ddims <- ddims[order(-ddims$depth),] With output... > ddims color depth table 7 J 61.88722 57.81239 6 I 61.84639 57.57728 5 H 61.83685 57.51781 4 G 61.75711 57.28863 1 D 61.69813 57.40459 3 F 61.69458 57.43354 2 E 61.66209 57.49120 Not too ugly, but I'm hoping for a way do it nicely within ddply(). Anyone know how? Hadley's ggplot2 book has this

Combine frequency tables into a single data frame

阅读更多关于 Combine frequency tables into a single data frame

I have a list in which each list item is a word frequency table derived from using "table()" on a different sample text. Each table is, therefore, a different length. I want to now convert the list into a single data frame in which each column is a word each row is a sample text. Here is a dummy example of my data: t1<-table(strsplit(tolower("this is a test in the event of a real word file you would see many more words here"), "\\W")) t2<-table(strsplit(tolower("Four score and seven years ago our fathers brought forth on this continent, a new nation, conceived in Liberty, and dedicated to the

Merge rows with duplicate IDs

阅读更多关于 Merge rows with duplicate IDs

问题 I would like to merge and sum the values of each row that contains duplicated IDs. For example, the data frame below contains a duplicated symbol 'LOC102723897'. I would like to merge these two rows and sum the value within each column, so that one row appears for the duplicated symbol. > head(y$genes) SM01 SM02 SM03 SM04 SM05 SM06 SM07 SM08 SM09 SM10 SM11 SM12 SM13 SM14 SM15 SM16 SM17 SM18 SM19 SM20 SM21 SM22 1 32 29 23 20 27 105 80 64 83 80 94 58 122 76 78 70 34 32 45 42 138 30 2 246 568

Select minimum data of grouped data - keeping all columns [duplicate]

阅读更多关于 Select minimum data of grouped data - keeping all columns [duplicate]

问题 This question already has an answer here : R: Uniques (or dplyr distinct) + most recent date (1 answer) Closed 4 years ago . I am running into a wall here. I have a dataframe , many rows. Here is schematic example. #myDf ID c1 c2 myDate A 1 1 01.01.2015 A 2 2 02.02.2014 A 3 3 03.01.2014 B 4 4 09.09.2009 B 5 5 10.10.2010 C 6 6 06.06.2011 .... I need to group my dataframe by my ID , and then select the row with the oldest date, and write the ouput into a new dataframe - keeping all rows. ID c1

Calculate elapsed “times”, where the reference time depends on a factor

阅读更多关于 Calculate elapsed “times”, where the reference time depends on a factor

问题 I'm trying to calculate elapsed times in a data frame, where the 'start' value for the elapsed time depends on the value of a factor column in the data frame. (To simply the question, I'll treat the time values as numeric rather than time objects - my question is about split-apply-combine, not time objects). My data frame looks like this: df <- data.frame(id=gl(2, 3, 5, labels=c("a", "b")), time=1:5) I'd like to calculate elapsed times by subtracting the minimum time in each factor level from

rbinding a list of lists of dataframes based on nested order

阅读更多关于 rbinding a list of lists of dataframes based on nested order

问题 I have a dataframe, df and a function process that returns a list of two dataframes, a and b . I use dlply to split up the df on an id column, and then return a list of lists of dataframes. Here's sample data/code that approximates the actual data and methods: df <- data.frame(id1=rep(c(1,2,3,4), each=2)) process <- function(df) { a <- data.frame(d1=rnorm(1), d2=rnorm(1)) b <- data.frame(id1=df$id1, a=rnorm(nrow(df)), b=runif(nrow(df))) list(a=a, b=b) } require(plyr) output <- dlply(df, .(id1

transform data frame string variable names

阅读更多关于 transform data frame string variable names

问题 I have a data frame that contains dates and id's. I need to add multiple columns to this data frame based on each date. I use ddply to do this as follows: ddply(df, "dt", transform, new_column1 = myfun(column_name_1)) However,I have a bunch of column names and would like to add multiple new columns. Is there a way that I can pass a string to transform instead of new_column1? For example I tried: ddply(df, "dt", transform, get("some_column_name")=myfun(column_name_1)) but this does not work.