问题
I have a data.table in this fashion:
dd <- data.table(f = c("a", "a", "a", "b", "b"), g = c(1,2,3,4,5))
dd
I need to sum the values g
by factor f
, and finally return a single row data.table object that has the maximum value of g
, but that also contains the factor information. i.e.
___f|g
1: b 9
My closest attempt so far is
tmp3 <- dd[, sum(g), by = f][, max(V1)]
tmp3
Which results in:
> tmp3
[1] 9
EDIT: I'm ideally looking for a purely data.table piece of code/workflow. I'm surprised that with all the speedy fast split-apply-combine wizardry and ability to subset your data in the form of 'example[i= subset, ]` that I haven't found a straight forward way to subset on a single value condition.
回答1:
Here's one way to do it:
library(data.table)
dd <- data.table(
f = c("a", "a", "a", "b", "b"),
g = c(1,2,3,4,5))
##
> dd[,list(g = sum(g)),by=f][which.max(g),]
f g
1: b 9
回答2:
You can use dplyr syntax on a data.table, in this case:
library(dplyr)
dd %>%
group_by(f) %>%
summarise (g = sum(g)) %>%
top_n(1, g)
Source: local data table [1 x 2]
f g
1 b 9
来源:https://stackoverflow.com/questions/29211800/data-table-sum-by-group-and-return-row-with-max-value