Grouping in data.table: how to get more than 1 column of results?

扶醉桌前 提交于 2020-01-01 10:02:22

问题


I have a data.table object like this one

library(data.table)

a <- structure(list(PERMNO = c(10006L, 10006L, 10015L, 10015L, 20000L, 20000L), 
                    SHROUT = c(1427L, 1427L, 1000L, 1001L, 200L, 200L), 
                    PRC = c(6.5, 6.125, 0.75, 0.5, 3, 4), 
                    RET = c(0.005, -0.005, -0.001, 0.05, -0.002, 0.0031)),
                   .Names = c("PERMNO", "SHROUT", "PRC", "RET"), 
               class = c("data.table", "data.frame"), row.names = c(NA, -6L))

setkey(a,PERMNO)

and I need to perform a number of calculations by PERMNO, but here in this example let's supposed they are only 2:

mktcap <- a[ , tail(SHROUT,n=1)*tail(PRC,n=1),by=PERMNO]
sqret <- a[, sum(RET^2),by=PERMNO]

which produce

> mktcap
     PERMNO       V1
[1,]  10006 8740.375
[2,]  10015  500.500
[3,]  20000  800.000

> sqret
     PERMNO        V1
[1,]  10006 5.000e-05
[2,]  10015 2.501e-03
[3,]  20000 1.361e-05

I would like to combine the two functions into one, to produce a matrix (or data.table, data.frame, whatever) with 3 columns, the first with the PERMNOs, the second with mktcap and the third with sqrt.

The problem is that this grouping function (i.e. variable[ , function(), by= ]) seems to only produce results with two columns, one with the keys and one with results.

This is my attempt (one of many) to produce what I want:

comb.fun <- function(datai) {
     mktcap <- as.matrix(tail(datai[,1],n=1)*tail(datai[,2],n=1),ncol=1)
     sqret <- as.matrix(sum(datai[,3]^2),ncol=1)
     return(c(mktcap,sqret))
}   

myresults <- a[, comb.fun(cbind(SHROUT,PRC,RET)), by=PERMNO]

which produces

     PERMNO           V1
[1,]  10006 8.740375e+03
[2,]  10006 5.000000e-05
[3,]  10015 5.005000e+02
[4,]  10015 2.501000e-03
[5,]  20000 8.000000e+02
[6,]  20000 1.361000e-05

(the results are all there, but they were forced into one column). No matter what I try, I cannot get grouping to return a matrix with more than two columns (or more than one column of results).

Is it possible to get two or more column of results with grouping in data.table?


回答1:


The answer (using list() to collect the several desired summary stats) is there in the excellent Examples section of the ?data.table help file. (It's about 20 lines up from the bottom).

out <- a[ , list(mktcap = tail(SHROUT,n=1)*tail(PRC,n=1),
                 sqret  = sum(RET^2)),
         by=PERMNO]

out
#    PERMNO   mktcap     sqret
# 1:  10006 8740.375 5.000e-05
# 2:  10015  500.500 2.501e-03
# 3:  20000  800.000 1.361e-05

Edit:

In the comments below, Matthew Dowle describes a simple way to clean up code in which the j argument in calls like x[i,j,by] is getting awkwardly long.

Implementing his suggestion on the call above, you could instead do:

## 1) Use quote() to make an expression object out of the statement passed to j
mm <- quote(list(mktcap = tail(SHROUT,n=1)*tail(PRC,n=1),
                 sqret  = sum(RET^2)))

## 2) Use eval() to evaluate it as if it had been typed directly in the call
a[ , eval(mm), by=PERMNO]
#    PERMNO   mktcap     sqret
# 1:  10006 8740.375 5.000e-05
# 2:  10015  500.500 2.501e-03
# 3:  20000  800.000 1.361e-05



回答2:


how about

comb.fun <- function(a) {
 mktcap <- a[ , tail(SHROUT,n=1)*tail(PRC,n=1),by=PERMNO]
 sqret <- a[, sum(RET^2),by=PERMNO]

 return(merge(mktcap,sqret))
} 


来源:https://stackoverflow.com/questions/11233183/grouping-in-data-table-how-to-get-more-than-1-column-of-results

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!