R - pass fixed columns to lapply function in data.table

我怕爱的太早我们不能终老 提交于 2020-01-03 04:54:06

问题


I have a data.table with columns p1, p2, ... which contains percentages. I want to compute the quantiles for each columns given a reference variable val. Conceptually, this is like:

quantile(val, p1, type = 4, na.rm = T)
quantile(val, p2, type = 4, na.rm = T)
...

My attempt at using data.table is as follows:

fun <- function(x, y) quantile(y, x, type = 4, na.rm = T)
dt[, c('q1', 'q2') := lapply(.SD, fun), .SDcols = c('p1', 'p2'), by = grp]
where grp is some grouping variable

However, I am having trouble specifying the y variable in a way that keeps it fixed.

I tried the following:

fun <- function(x, y, dt) quantile(dt[, y], x, type = 4, na.rm = T)
dt[, c('q1', 'q2') := lapply(.SD, fun, y, dt), .SDcols = c('p1', 'p2'), by = grp]

But doing it this fashion does not enforce the grouping when the quantiles are computed. It will compute the quantile based on the whole range of the y variable instead of the y within groups. What is the correct way to do this?

EDIT:

Here is a trivial example of just one variable:

> dt <- data.table(y = 1:10, p1 = rep(seq(0.2, 1, 0.2), 2), g = c(rep('a', 5), rep('b', 5)))
> dt
     y  p1 g
 1:  1 0.2 a
 2:  2 0.4 a
 3:  3 0.6 a
 4:  4 0.8 a
 5:  5 1.0 a
 6:  6 0.2 b
 7:  7 0.4 b
 8:  8 0.6 b
 9:  9 0.8 b
10: 10 1.0 b
> fun <- function(x, dt, y) quantile(dt[, y], x, type = 4, na.rm = T)
> dt[, c('q1') := lapply(.SD, fun, dt, y), .SDcols = c('p1'), by = c('g')]
> dt
     y  p1 g q1
 1:  1 0.2 a  2
 2:  2 0.4 a  4
 3:  3 0.6 a  6
 4:  4 0.8 a  8
 5:  5 1.0 a 10
 6:  6 0.2 b  2
 7:  7 0.4 b  4
 8:  8 0.6 b  6
 9:  9 0.8 b  8
10: 10 1.0 b 10

You can see q1 is computed using the entire range of y.


回答1:


I find the idea that you would store the percentages you require in the same data.table as the data with which you wish to calculate the quantiles very strange, however here is an approach that will work

dt <- data.table(x=10:1,y = 1:10, p1 = rep(seq(0.2, 1, 0.2), 2), g = c(rep('a', 5), rep('b', 5)))


dt[, c('qx','qy') := Map(f = quantile, x = list(x, y), prob = list(p1), type = 4), by = g]

You can use .SDcols on within .SD to select the columns you want

dt[, c('qx','qy') := Map(f = quantile, x = .SD[, .SDcols = c('x','y')], 
                         prob = list(p1), type = 4), by = g]

Or use with =FALSE

dt[, c('qx','qy') := Map(f = quantile, x = .SD[, c('x', 'y')], 
                          prob = list(p1), type = 4), by = g]


来源:https://stackoverflow.com/questions/20230227/r-pass-fixed-columns-to-lapply-function-in-data-table

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!