plyr | 易学教程

dplyr: “Error in n(): function should not be called directly”

阅读更多关于 dplyr: “Error in n(): function should not be called directly”

问题 I am attempting to reproduce one of the examples in the dplyr package but this error message. I am expecting to see a new column n produced with the frequency of each combination. Can someone tell me what I am missing? I triple checked that the package is loaded. Thanks for the help, as always. library(dplyr) # summarise peels off a single layer of grouping by_vs_am <- group_by(mtcars, vs, am) by_vs <- summarise(by_vs_am, n = n()) Error in n() : This function should not be called directly 回答1

Unique rows, considering two columns, in R, without order

阅读更多关于 Unique rows, considering two columns, in R, without order

问题 Unlike questions I've found, I want to get the unique of two columns without order. I have a df: df<-cbind(c("a","b","c","b"),c("b","d","e","a")) > df [,1] [,2] [1,] "a" "b" [2,] "b" "d" [3,] "c" "e" [4,] "b" "a" In this case, row 1 and row 4 are "duplicates" in the sense that b-a is the same as b-a. I know how to find unique of columns 1 and 2 but I would find each row unique under this approach. 回答1: There are lot's of ways to do this, here is one: unique(t(apply(df, 1, sort))) duplicated(t

Idiomatic R code for partitioning a vector by an index and performing an operation on that partition

阅读更多关于 Idiomatic R code for partitioning a vector by an index and performing an operation on that partition

问题 I'm trying to find the idiomatic way in R to partition a numerical vector by some index vector, find the sum of all numbers in that partition and then divide each individual entry by that partition sum. In other words, if I start with this: df <- data.frame(x = c(1,2,3,4,5,6), index = c('a', 'a', 'b', 'b', 'c', 'c')) I want the output to create a vector (let's call it z): c(1/(1+2), 2/(1+2), 3/(3+4), 3/(3+4), 5/(5+6), 6/(5+6)) If I were doing this is SQL and could use window functions, I

Idiomatic R code for partitioning a vector by an index and performing an operation on that partition

阅读更多关于 Idiomatic R code for partitioning a vector by an index and performing an operation on that partition

Why does summarize or mutate not work with group_by when I load `plyr` after `dplyr`?

阅读更多关于 Why does summarize or mutate not work with group_by when I load `plyr` after `dplyr`?

问题 Note: The title of this question has been edited to make it the canonical question for issues when plyr functions mask their dplyr counterparts. The rest of the question remains unchanged. Suppose I have the following data: dfx <- data.frame( group = c(rep('A', 8), rep('B', 15), rep('C', 6)), sex = sample(c("M", "F"), size = 29, replace = TRUE), age = runif(n = 29, min = 18, max = 54) ) With the good old plyr I can create a little table summarizing my data with the following code: require

rollsum with fixed dates

阅读更多关于 rollsum with fixed dates

问题 I have a data frame that looks like this: user_id date price 2375 2012/12/12 00:00:00.000 47.900000 2375 2013/01/16 00:00:00.000 47.900000 2375 2013/01/16 00:00:00.000 47.900000 2375 2013/05/08 00:00:00.000 47.900000 2375 2013/06/01 00:00:00.000 47.900000 2375 2013/10/02 00:00:00.000 26.500000 2375 2014/01/22 00:00:00.000 47.900000 2375 2014/03/21 00:00:00.000 47.900000 2375 2014/05/24 00:00:00.000 47.900000 2375 2015/04/11 00:00:00.000 47.900000 7419 2012/12/12 00:00:00.000 7.174977 7419

分组功能（tapply，by，aggregate）和* apply系列

阅读更多关于分组功能（tapply，by，aggregate）和* apply系列

【推荐】2019 Java 开发者跳槽指南.pdf(吐血整理) >>> 每当我想在R中做“ map” py任务时，我通常都会尝试在 apply 系列中使用一个函数。但是，我从未完全理解它们之间的区别-{ sapply ， lapply 等}如何将函数应用于输入/分组输入，输出将是什么样，甚至输入是什么-所以我经常只是遍历所有这些，直到得到想要的东西。谁能解释什么时候使用哪一个？我目前（可能不正确/不完整）的理解是... sapply(vec, f) ：输入是向量。输出是一个向量/矩阵，其中元素 i 为 f(vec[i]) ，如果 f 具有多元素输出，则为您提供矩阵 lapply(vec, f) ：与 sapply 相同，但是输出是一个列表？ apply(matrix, 1/2, f) ：输入是一个矩阵。输出是一个向量，其中元素 i 为f（矩阵的行/列i） tapply(vector, grouping, f) ：输出是一个矩阵/数组，其中矩阵/数组中的元素是向量分组 g 处的 f 值，并且 g 被推到行/列名 by(dataframe, grouping, f) ：令 g 为一个分组。将 f 应用于组/数据框的每一列。在每列漂亮地打印分组和 f 的值。 aggregate(matrix, grouping, f) ：类似于 by

Accessing fitted.values when using ddply

阅读更多关于 Accessing fitted.values when using ddply

问题 I am using ddply to execute glm on subsets of my data. I am having difficulty accessing the estimated Y values. I am able to get the model parameter estimates using the below code, but all the variations I've tried to get the fitted values have fallen short. The dependent and independent variables in the glm model are column vectors, as is the "Dmsa" variable used in the ddply operation. Define the model: Model <- function(df){coef(glm(Y~D+O+B+A+log(M), family=poisson(link="log"), data=df))}

R - Speeding up approximate date match. idata.frame?

阅读更多关于 R - Speeding up approximate date match. idata.frame?

问题 I am struggling to efficiently perform a "close" date match between two data frames. This question explores a solution using idata.frame from the plyr package, but I would be very happy with other suggested solutions as well. Here is a very simplistic version of the two data frames: sampleticker<-data.frame(cbind(ticker=c("A","A","AA","AA"), date=c("2005-1-25","2005-03-30","2005-02-15","2005-04-21"))) sampleticker$date<-as.Date(sampleticker$date,format="%Y-%m-%d") samplereport<-data.frame

How can correlate against multiple columns using ddply?

阅读更多关于 How can correlate against multiple columns using ddply?

问题 I have a data.frame and I want to calculate correlation coefficients using one column against the other columns (there are some non-numeric columns in the frame as well). ddply(Banks,.(brand_id,standard.quarter),function(x) { cor(BLY11,x) }) # Error in cor(BLY11, x) : 'y' must be numeric I tested against is.numeric(x) ddply(Banks,.(brand_id,standard.quarter),function(x) { if is.numeric(x) cor(BLY11,x) else 0 }) but that failed every comparison and returned 0 and returned only one column, as