plyr | 易学教程

Applying a function to every row of a table using dplyr?

阅读更多关于 Applying a function to every row of a table using dplyr?

When working with plyr I often found it useful to use adply for scalar functions that I have to apply to each and every row. e.g. data(iris) library(plyr) head( adply(iris, 1, transform , Max.Len= max(Sepal.Length,Petal.Length)) ) Sepal.Length Sepal.Width Petal.Length Petal.Width Species Max.Len 1 5.1 3.5 1.4 0.2 setosa 5.1 2 4.9 3.0 1.4 0.2 setosa 4.9 3 4.7 3.2 1.3 0.2 setosa 4.7 4 4.6 3.1 1.5 0.2 setosa 4.6 5 5.0 3.6 1.4 0.2 setosa 5.0 6 5.4 3.9 1.7 0.4 setosa 5.4 Now I'm using dplyr more, I'm wondering if there is a tidy/natural way to do this? As this is NOT what I want: library(dplyr)

dplyr: “Error in n(): function should not be called directly”

阅读更多关于 dplyr: “Error in n(): function should not be called directly”

I am attempting to reproduce one of the examples in the dplyr package but this error message. I am expecting to see a new column n produced with the frequency of each combination. Can someone tell me what I am missing? I triple checked that the package is loaded. Thanks for the help, as always. library(dplyr) # summarise peels off a single layer of grouping by_vs_am <- group_by(mtcars, vs, am) by_vs <- summarise(by_vs_am, n = n()) Error in n() : This function should not be called directly mnel I presume you have dplyr and plyr loaded in the same session. dplyr is not plyr . ddply is not a

Unique rows, considering two columns, in R, without order

阅读更多关于 Unique rows, considering two columns, in R, without order

Unlike questions I've found, I want to get the unique of two columns without order. I have a df: df<-cbind(c("a","b","c","b"),c("b","d","e","a")) > df [,1] [,2] [1,] "a" "b" [2,] "b" "d" [3,] "c" "e" [4,] "b" "a" In this case, row 1 and row 4 are "duplicates" in the sense that b-a is the same as b-a. I know how to find unique of columns 1 and 2 but I would find each row unique under this approach. There are lot's of ways to do this, here is one: unique(t(apply(df, 1, sort))) duplicated(t(apply(df, 1, sort))) One gives the unique rows, the other gives the mask. If it's just two columns, you can

Change value of variable with dplyr [duplicate]

阅读更多关于 Change value of variable with dplyr [duplicate]

问题 This question already has an answer here: Set certain values to NA with dplyr 5 answers I regularly need to change the values of a variable based on the values on a different variable, like this: mtcars$mpg[mtcars$cyl == 4] <- NA I tried doing this with dplyr but failed miserably: mtcars %>% mutate(mpg = mpg == NA[cyl == 4]) %>% as.data.frame() How could I do this with dplyr ? 回答1: We can use replace to change the values in 'mpg' to NA that corresponds to cyl==4 . mtcars %>% mutate(mpg

Group by multiple columns and sum other multiple columns

阅读更多关于 Group by multiple columns and sum other multiple columns

I have a data frame with about 200 columns, out of them I want to group the table by first 10 or so which are factors and sum the rest of the columns. I have list of all the column names which I want to group by and the list of all the cols which I want to aggregate. The output format that I am looking for needs to be the same dataframe with same number of cols, just grouped together. Is there a solution using packages data.table , plyr or any other? The data.table way is : DT[, lapply(.SD,sum), by=list(col1,col2,col3,...)] or DT[, lapply(.SD,sum), by=colnames(DT)[1:10]] where .SD is the (S

R: speeding up “group by” operations

阅读更多关于 R: speeding up “group by” operations

问题 I have a simulation that has a huge aggregate and combine step right in the middle. I prototyped this process using plyr\'s ddply() function which works great for a huge percentage of my needs. But I need this aggregation step to be faster since I have to run 10K simulations. I\'m already scaling the simulations in parallel but if this one step were faster I could greatly decrease the number of nodes I need. Here\'s a reasonable simplification of what I am trying to do: library(Hmisc) # Set

What does the dot mean in R – personal preference, naming convention or more?

阅读更多关于 What does the dot mean in R – personal preference, naming convention or more?

问题 I am (probably) NOT referring to the \"all other variables\" meaning like var1~. here. I was pointed to plyr once again and looked into mlply and wondered why parameters are defined with leading dot like this: function (.data, .fun = NULL, ..., .expand = TRUE, .progress = \"none\", .parallel = FALSE) { if (is.matrix(.data) & !is.list(.data)) .data <- .matrix_to_df(.data) f <- splat(.fun) alply(.data = .data, .margins = 1, .fun = f, ..., .expand = .expand, .progress = .progress, .parallel =

Better way to convert list to vector?

阅读更多关于 Better way to convert list to vector?

问题 I have a list of named values: myList <- list(\'A\'=1, \'B\'=2, \'C\'=3) I want a vector with the value 1:3 I can\'t figure out how to extract the values without defining a function. Is there a simpler way that I\'m unaware of? library(plyr) myvector <- laply(myList, function(x) x) Is there something akin to myList$Values to strip the names and return it as a vector? 回答1: Use unlist with use.names = FALSE argument. unlist(myList, use.names=FALSE) 回答2: purrr::flatten_*() is also a good option.

Is there a R function that applies a function to each pair of columns?

阅读更多关于 Is there a R function that applies a function to each pair of columns?

问题 I often need to apply a function to each pair of columns in a dataframe/matrix and return the results in a matrix. Now I always write a loop to do this. For instance, to make a matrix containing the p-values of correlations I write: df <- data.frame(x=rnorm(100),y=rnorm(100),z=rnorm(100)) n <- ncol(df) foo <- matrix(0,n,n) for ( i in 1:n) { for (j in i:n) { foo[i,j] <- cor.test(df[,i],df[,j])$p.value } } foo[lower.tri(foo)] <- t(foo)[lower.tri(foo)] foo [,1] [,2] [,3] [1,] 0.0000000 0.7215071

meaning of ddply error: 'names' attribute [9] must be the same length as the vector [1]

阅读更多关于 meaning of ddply error: 'names' attribute [9] must be the same length as the vector [1]

问题 I\'m going through Machine Learning for Hackers, and I am stuck at this line: from.weight <- ddply(priority.train, .(From.EMail), summarise, Freq = length(Subject)) Which generates the following error: Error in attributes(out) <- attributes(col) : \'names\' attribute [9] must be the same length as the vector [1] This is a traceback(): > traceback() 11: FUN(1:5[[1L]], ...) 10: lapply(seq_len(n), extract_col_rows, df = x, i = i) 9: extract_rows(x$data, x$index[[i]]) 8: `[[.indexed_df`(pieces, i