data.table: Using with=False and transforming function/summary function?

筅森魡賤 提交于 2019-12-23 10:17:04

问题


I want to summarise several variables in data.table, output in wide format, output possibly as a list per variable. Since several other approaches did not work, I tried to do an outer lapply, giving the names of the variables as character vectors. I wanted to pass these in, using with=FALSE.

carsx=as.data.table(cars)
lapply( list(speed="speed",dist= "dist"), #error object 'ansvals' not found
    function(x)  carsx[,list(mean(x), min(x), max(x) ), with=FALSE ] ) 

Since this does not work, I tried the more simple approach without lapply.

carsx[,list(mean("speed"), min("speed"), max("speed") ), with=FALSE ] #error object 'ansvals' not found

This does not work either. Is there any way to do something like this? Is this behaviour of 'with' wanted? (I am aware that ?data.table mentions with only to select columns, but in my case it would be useful to be able to transform them as well)

When with=FALSE, j is a vector of names or positions to select, similar to a data.frame. with=FALSE is often useful in data.table to select columns dynamically.

EDIT My aim is to get a summary per group in wide format, for different variables. I tried to extend the following, which works only for one variable, for a list of variables.

carsx[,list(mean(speed), min(speed), max(speed) ) ,by=(dist>50)

Lamentably SO doesnt let me post my other question. There I described that I want an output similiar to:

lapply( list(speed="speed",dist= "dist"),
        function(x) do.call("as.data.frame", aggregate(cars[,x], list(class=cars$dist>50), FUN=summary) ) )

Expected Output would be something like:

$speed 
         V1       V2 V3
1: FALSE 12.96970  4 20
2:  TRUE 20.11765 14 25

$dist
         V1       V2 V3
1: FALSE 12.96970  4 20
2:  TRUE 20.11765 14 25

回答1:


You can specify the columns with the .SDcols parameter:

carsx[ , lapply(.SD, function(x) c(mean(x), min(x), max(x))), 
      .SDcols = c("speed", "dist")]
#    speed   dist
# 1:  15.4  42.98
# 2:   4.0   2.00
# 3:  25.0 120.00

carsx[ , lapply(.SD, function(x) c(mean(x), min(x), max(x))), 
      .SDcols = "speed"]
#    speed
# 1:  15.4
# 2:   4.0
# 3:  25.0



回答2:


Building on Svens answer a combination of .SDcols, rbindlist, and outer and inner lapply did the trick. The inner lapply is necessary to access .SD.

lapply( list(speed="speed",dist= "dist"),
    function(x)  carsx[ , rbindlist(lapply(.SD, function(x) list(mean=mean(x), min=min(x), max=max(x)) )), 
                       .SDcols = x,by= (dist>50)] ) 

Result:

$speed
    dist     mean min max
1: FALSE 12.96970   4  20
2:  TRUE 20.11765  14  25

$dist
    dist     mean min max
1: FALSE 27.84848   2  50
2:  TRUE 72.35294  52 120


来源:https://stackoverflow.com/questions/26844251/data-table-using-with-false-and-transforming-function-summary-function

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!