ddply for sum by group in R

爱⌒轻易说出口 提交于 2019-11-27 12:24:08

As pointed out in a comment, you can do multiple operations inside the summarize.

This reduces your code to one line of ddply() and one line of subsetting, which is easy enough with the [ operator:

x <- ddply(data, .(Y), summarize, freq=length(Y), tot=sum(income))
x[x$freq > 3, ]

       Y freq  tot
3 228122    4 6778

This is also exceptionally easy with the data.table package:

library(data.table)
data.table(data)[, list(freq=length(income), tot=sum(income)), by=Y][freq > 3]
        Y freq  tot
1: 228122    4 6778

In fact, the operation to calculate the length of a vector has its own shortcut in data.table - use the .N shortcut:

data.table(data)[, list(freq=.N, tot=sum(income)), by=Y][freq > 3]
        Y freq  tot
1: 228122    4 6778
HatMatrix

I think the package dplyr is faster than plyr::ddply and more elegant.

testData <- read.table(file = "clipboard",header = TRUE)
require(dplyr)
testData %>%
  group_by(Y) %>%
  summarise(total = sum(income),freq = n()) %>%
  filter(freq > 3)
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!