scale in data.table in r

谁说胖子不能爱 提交于 2019-12-07 09:18:15

问题


       LC  RC TOEIC eua again  class
   1: 490 390   880  90     0 100818
   2: 495 395   890  90     0 100818
   3: 490 330   820  90     0 100818
   4: 495 460   955  96     0 100818
   5: 495 370   865  91     0 100818
  ---                               
1021: 470 400   870  61     0 100770
1022: 260 180   440  48     0 100770
1023: 345 190   535  39     0 100770
1024: 450 295   745  65     0 100770
1025: 395 230   625  79     0 100770

This data.table is named "analy"

I want to scale the variables "LC","RC","TOEIC","eua". I can scale as below

analy[,LC:=scale(LC)]
analy[,RC:=scale(RC)]
analy[,TOEIC:=scale(TOEIC)]
analy[,eua:=scale(eua)]

but, I want to know how to scale the variables at once.


回答1:


analy[ , c("LC", "RC", "TOEIC", "eua") := lapply(list(LC, RC, TOEIC, eua), scale)] 

A little more convenient way of doing it would be (as @David mentions under comment):

cols <- c("LC", "RC", "TOEIC", "eua")
analy[, (cols) := lapply(.SD, scale), .SDcols=cols]

Note the ( around cols is necessary so that cols is evaluated to get the column names, and then modify them by reference. This is so that we can still continue doing: DT[ ,col := val].




回答2:


This is related to a more general question on pre-processing columns in data.table using .SD posted here: Computing inter-value differences in data.table columns (with .SD) in R.

Here's the answer to your question, and what you will get if you use scale() function incorrectly:

DT <- data.table(K=c(rep(1,5),rep(2,5)), X=(1:10)^2, Y=2^(1:10))
cols <- 2:3;  cols.d0 = paste0("d0.", names(DT)[cols])

# Correct and incorrect use of scale() with data.table

# Works for one column.
DT[, d0_Y:= scale(Y), keyby=K][]

# RUNS BUT GIVES WRONG RESULT! ==> returns 1:20 data.table!
DT[, scale(.SD), keyby=K, .SDcols=cols][]

# RUNS WITH WARNING AND GIVES WRONG RESULT! - d0.X is computed correctly, by d0.Y not (compare to d0_Y) !
DT[, (cols.d0) := scale(.SD), keyby=K, .SDcols=cols][]
>     K   X    Y       d0_Y        d0.X        d0.Y
   1: 1   1    2 -0.8525736 -1.03417538 -1.03417538
   ...

# DOESN'T RUN ! - ERROR
DT[, (cols.d0) := lapply(.SD, scale), keyby=K, .SDcols=cols][]

# WORKS CORRECTLY AS DESIRED !
DT[, (cols.d0) := lapply(.SD, function(x) as.vector(scale(x))), keyby=K, .SDcols=cols][] 



回答3:


The following answer works.

library(data.table)
# Data 
dt <- iris
setDT(dt)

# columns to apply the scale function
cols <- colnames(dt)[-5]

# standerdize
dt[, (cols) := lapply(.SD, function(x) as.vector(scale(x))),
                                      by = Species, .SDcols = cols]

Since scale returns a matrix, as.vector is used to convert to a vector.



来源:https://stackoverflow.com/questions/24260891/scale-in-data-table-in-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!