data.table sum by group and return row with max value

时间秒杀一切 提交于 2020-01-02 07:41:12

问题


I have a data.table in this fashion:

dd <- data.table(f = c("a", "a", "a", "b", "b"), g = c(1,2,3,4,5))
dd

I need to sum the values g by factor f, and finally return a single row data.table object that has the maximum value of g, but that also contains the factor information. i.e.

___f|g   
1: b 9

My closest attempt so far is

tmp3 <- dd[, sum(g), by = f][, max(V1)]
tmp3

Which results in:

> tmp3
[1] 9

EDIT: I'm ideally looking for a purely data.table piece of code/workflow. I'm surprised that with all the speedy fast split-apply-combine wizardry and ability to subset your data in the form of 'example[i= subset, ]` that I haven't found a straight forward way to subset on a single value condition.


回答1:


Here's one way to do it:

library(data.table)
dd <- data.table(
  f = c("a", "a", "a", "b", "b"), 
  g = c(1,2,3,4,5))
##
> dd[,list(g = sum(g)),by=f][which.max(g),]
   f g
1: b 9




回答2:


You can use dplyr syntax on a data.table, in this case:

library(dplyr)
dd %>%
  group_by(f) %>%
  summarise (g = sum(g)) %>%
  top_n(1, g)

Source: local data table [1 x 2]

  f g
1 b 9


来源:https://stackoverflow.com/questions/29211800/data-table-sum-by-group-and-return-row-with-max-value

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!