backtransform `scale()` for plotting

前端未结

关注

 8  2249

I have a explanatory variable that is centered using scale() that is used to predict a response variable:

d <- data.frame(
  x=runif(100),


                      
              相关标签:


      
      
        
          8条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  执念已碎        
                
              
                            
                2020-11-28 07:20
              
            
            
                                                                       
I felt like this should be a proper function, here was my attempt at it:

#' Reverse a scale
#'
#' Computes x = sz+c, which is the inverse of z = (x - c)/s 
#' provided by the \code{scale} function.
#' 
#' @param z a numeric matrix(like) object
#' @param center either NULL or a numeric vector of length equal to the number of columns of z  
#' @param scale  either NULL or a a numeric vector of length equal to the number of columns of z
#'
#' @seealso \code{\link{scale}}
#'  mtcs <- scale(mtcars)
#'  
#'  all.equal(
#'    unscale(mtcs), 
#'    as.matrix(mtcars), 
#'    check.attributes=FALSE
#'  )
#'  
#' @export
unscale <- function(z, center = attr(z, "scaled:center"), scale = attr(z, "scaled:scale")) {
  if(!is.null(scale))  z <- sweep(z, 2, scale, `*`)
  if(!is.null(center)) z <- sweep(z, 2, center, `+`)
  structure(z,
    "scaled:center"   = NULL,
    "scaled:scale"    = NULL,
    "unscaled:center" = center,
    "unscaled:scale"  = scale
  )
}

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  别跟我提以往        
                
              
                            
                2020-11-28 07:22
              
            
            
                                                                       
tl;dr:
unscaled_vals <- xs + attr(xs, 'scaled:scale') + attr(xs, 'scaled:center')


where xs is a scaled object created by scale(x)


Just for those trying to make a bit of sense about this:
How R scales:
The scale function performs both scaling and centering by default.

Of the two, the function performs centering first.

Centering is achieved by default by subtracting the mean of all !is.na input values from each value:
data - mean(data, rm.na = T)

Scaling is achieved via:
sqrt( ( sum(x^2) ) / n - 1)

where x is the set of all !is.na values to scale and n = length(x).

Importantly, though, when center =T in scale, x is not the original set of data, but the already centered data.
So if center = T (the default), the scaling function is really calculating:
 sqrt( ( sum( (data - mean(data, rm.na = T))^2) ) / n - 1)


Note: [when center = T] this is the same as taking the standard deviation: sd(data).



How to Unscale:
Explanation:

first multiply by scaling factor:
y = x * sqrt( ( sum( (x - mean(x , na.rm = T))^2) ) / (length(x) - 1))


then add back mean:
y + mean(x , na.rm = T)



Obviously you need to know the mean of the original set of data for this manual approach to truly be useful, but I place it here for conceptual sake.
Luckily, as previous answers have shown, the "centering" value (i.e., the mean) is located in the attributes of a scale object, so this approach can be simplified to:
How to do in R:
unscaled_vals <- xs + attr(xs, 'scaled:scale') + attr(xs, 'scaled:center')


where xs is a scaled object created by scale(x).

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  日久生厌        
                
              
                            
                2020-11-28 07:22
              
            
            
                                                                       
Old question, but why wouldn't you just do this:

plot(d$x, predict(m1, d))


As an easier way than manually using the attributes from the scaled object, DMwR has a function for this: unscale.  It works like this:

d <- data.frame(
  x=runif(100)
)

d$y <- 17 + d$x * 12

s.x <- scale(d$x)

m1 <- lm(d$y~s.x)

library(DMwR)
unsc.x <- unscale(d$x, s.x)
plot(unsc.x, predict(m1, d))


Importantly, the second argument of unscale needs to have something with the attributes of 'scaled:scale' and 'scaled:center'
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  猫巷女王i        
                
              
                            
                2020-11-28 07:23
              
            
            
                                                                       
Just inspired by Fermando´s answer, but unscaling line with less code:
set.seed(1)
x = matrix(sample(1:12), ncol= 3)
xs = scale(x, center = TRUE, scale = TRUE)
center <- attr(xs,"scaled:center")
scale <- attr(xs,"scaled:scale")
x.orig <- t(t(xs) * scale + center) # code is less here

print(x)
[1,]    9    2    6
[2,]    4    5   11
[3,]    7    3   12
[4,]    1    8   10

print(x.orig)
[1,]    9    2    6
[2,]    4    5   11
[3,]    7    3   12
[4,]    1    8   10
attr(,"scaled:center")
[1] 5.25 4.50 9.75
attr(,"scaled:scale")
[1] 3.50 2.65 2.63

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  孤独总比滥情好        
                
              
                            
                2020-11-28 07:32
              
            
            
                                                                       
I am late to the party. But here is a useful tool to scale/unscale data in array format.
Example:
> (data <- array(1:8, c(2, 4)))            # create data
     [,1] [,2] [,3] [,4]
[1,]    1    3    5    7
[2,]    2    4    6    8
> obj <- Scale(data)                       # create object
> (data_scaled <- obj$scale(data))         # scale data
           [,1]       [,2]       [,3]       [,4]
[1,] -0.7071068 -0.7071068 -0.7071068 -0.7071068
[2,]  0.7071068  0.7071068  0.7071068  0.7071068
> (obj$unscale(data_scaled))               # unscale scaled data
     [,1] [,2] [,3] [,4]
[1,]    1    3    5    7
[2,]    2    4    6    8

## scale or unscale another dataset
## using the same mean/sd parameters
> (data2 <- array(seq(1, 24, 2), c(3, 4))) # create demo data
     [,1] [,2] [,3] [,4]
[1,]    1    7   13   19
[2,]    3    9   15   21
[3,]    5   11   17   23
> (data2_scaled <- obj$scale(data2))       # scale data
           [,1]      [,2]     [,3]     [,4]
[1,] -0.7071068  4.949747 10.60660 16.26346
[2,]  2.1213203  7.778175 13.43503 19.09188
[3,]  4.9497475 10.606602 16.26346 21.92031
> (obj$unscale(data2_scaled))              # unscale scaled data
     [,1] [,2] [,3] [,4]
[1,]    1    7   13   19
[2,]    3    9   15   21
[3,]    5   11   17   23

Function Scale():
Scale <- function(data, margin=2, center=TRUE, scale=TRUE){
    stopifnot(is.array(data), is.numeric(data),
              any(mode(margin) %in% c("integer", "numeric")),
              length(margin) < length(dim(data)),
              max(margin) <= length(dim(data)),
              min(margin) >= 1,
              !any(duplicated(margin)),
              is.logical(center), length(center)==1,
              is.logical(scale), length(scale)==1,
                  !(isFALSE(center) && isFALSE(scale)))
    margin <- as.integer(margin)

    m <- if(center) apply(data, 2, mean, na.rm=TRUE) else NULL
    s <- if(scale)  apply(data, 2, sd, na.rm=TRUE) else NULL
    ldim <- length(dim(data))
    cdim <- dim(data)[margin]
    data <- NULL # don't store the data

    Scale <- function(data){
        stopifnot(is.array(data), is.numeric(data),
                  length(dim(data)) == ldim,
                  dim(data)[margin] == cdim)
        if(center)
            data <- sweep(data, margin, m, `-`)
        if(scale)
            data <- sweep(data, margin, s, `/`)
        data
    }

    Unscale <- function(data){
        stopifnot(is.array(data), is.numeric(data),
                  length(dim(data)) == ldim,
                  dim(data)[margin] == cdim)
        if(scale)
            data <- sweep(data, margin, s, `*`)
        if(center)
            data <- sweep(data, margin, m, `+`)
        data
    }
    list(scale=Scale, unscale=Unscale, mean=m, sd=s)
}

Note:
data.frames are not support yet.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  一生所求        
                
              
                            
                2020-11-28 07:44
              
            
            
                                                                       
For a data frame or matrix:

set.seed(1)
x = matrix(sample(1:12), ncol= 3)
xs = scale(x, center = TRUE, scale = TRUE)

x.orig = t(apply(xs, 1, function(r)r*attr(xs,'scaled:scale') + attr(xs, 'scaled:center')))

print(x)
     [,1] [,2] [,3]
[1,]    4    2    3
[2,]    5    7    1
[3,]    6   10   11
[4,]    9   12    8

print(x.orig)
     [,1] [,2] [,3]
[1,]    4    2    3
[2,]    5    7    1
[3,]    6   10   11
[4,]    9   12    8


Be careful when using functions like identical():

print(x - x.orig)
     [,1] [,2]         [,3]
[1,]    0    0 0.000000e+00
[2,]    0    0 8.881784e-16
[3,]    0    0 0.000000e+00
[4,]    0    0 0.000000e+00

identical(x, x.orig)
# FALSE

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
   
          
     1
2
下一页
           
           
        
                                  
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复