backtransform `scale()` for plotting

前端 未结 8 2249
别那么骄傲
别那么骄傲 2020-11-28 07:05

I have a explanatory variable that is centered using scale() that is used to predict a response variable:

d <- data.frame(
  x=runif(100),
           


        
相关标签:
8条回答
  • 2020-11-28 07:20

    I felt like this should be a proper function, here was my attempt at it:

    #' Reverse a scale
    #'
    #' Computes x = sz+c, which is the inverse of z = (x - c)/s 
    #' provided by the \code{scale} function.
    #' 
    #' @param z a numeric matrix(like) object
    #' @param center either NULL or a numeric vector of length equal to the number of columns of z  
    #' @param scale  either NULL or a a numeric vector of length equal to the number of columns of z
    #'
    #' @seealso \code{\link{scale}}
    #'  mtcs <- scale(mtcars)
    #'  
    #'  all.equal(
    #'    unscale(mtcs), 
    #'    as.matrix(mtcars), 
    #'    check.attributes=FALSE
    #'  )
    #'  
    #' @export
    unscale <- function(z, center = attr(z, "scaled:center"), scale = attr(z, "scaled:scale")) {
      if(!is.null(scale))  z <- sweep(z, 2, scale, `*`)
      if(!is.null(center)) z <- sweep(z, 2, center, `+`)
      structure(z,
        "scaled:center"   = NULL,
        "scaled:scale"    = NULL,
        "unscaled:center" = center,
        "unscaled:scale"  = scale
      )
    }
    
    0 讨论(0)
  • 2020-11-28 07:22

    tl;dr:

    unscaled_vals <- xs + attr(xs, 'scaled:scale') + attr(xs, 'scaled:center')
    
    • where xs is a scaled object created by scale(x)

    Just for those trying to make a bit of sense about this:

    How R scales:

    The scale function performs both scaling and centering by default.

    • Of the two, the function performs centering first.

    Centering is achieved by default by subtracting the mean of all !is.na input values from each value:

    data - mean(data, rm.na = T)
    

    Scaling is achieved via:

    sqrt( ( sum(x^2) ) / n - 1)
    

    where x is the set of all !is.na values to scale and n = length(x).

    • Importantly, though, when center =T in scale, x is not the original set of data, but the already centered data.

      So if center = T (the default), the scaling function is really calculating:

       sqrt( ( sum( (data - mean(data, rm.na = T))^2) ) / n - 1)
      
      • Note: [when center = T] this is the same as taking the standard deviation: sd(data).

    How to Unscale:

    Explanation:

    1. first multiply by scaling factor:

      y = x * sqrt( ( sum( (x - mean(x , na.rm = T))^2) ) / (length(x) - 1))
      
    2. then add back mean:

      y + mean(x , na.rm = T)
      

    Obviously you need to know the mean of the original set of data for this manual approach to truly be useful, but I place it here for conceptual sake.

    Luckily, as previous answers have shown, the "centering" value (i.e., the mean) is located in the attributes of a scale object, so this approach can be simplified to:

    How to do in R:

    unscaled_vals <- xs + attr(xs, 'scaled:scale') + attr(xs, 'scaled:center')
    
    • where xs is a scaled object created by scale(x).
    0 讨论(0)
  • 2020-11-28 07:22

    Old question, but why wouldn't you just do this:

    plot(d$x, predict(m1, d))
    

    As an easier way than manually using the attributes from the scaled object, DMwR has a function for this: unscale. It works like this:

    d <- data.frame(
      x=runif(100)
    )
    
    d$y <- 17 + d$x * 12
    
    s.x <- scale(d$x)
    
    m1 <- lm(d$y~s.x)
    
    library(DMwR)
    unsc.x <- unscale(d$x, s.x)
    plot(unsc.x, predict(m1, d))
    

    Importantly, the second argument of unscale needs to have something with the attributes of 'scaled:scale' and 'scaled:center'

    0 讨论(0)
  • 2020-11-28 07:23

    Just inspired by Fermando´s answer, but unscaling line with less code:

    set.seed(1)
    x = matrix(sample(1:12), ncol= 3)
    xs = scale(x, center = TRUE, scale = TRUE)
    center <- attr(xs,"scaled:center")
    scale <- attr(xs,"scaled:scale")
    x.orig <- t(t(xs) * scale + center) # code is less here
    
    print(x)
    [1,]    9    2    6
    [2,]    4    5   11
    [3,]    7    3   12
    [4,]    1    8   10
    
    print(x.orig)
    [1,]    9    2    6
    [2,]    4    5   11
    [3,]    7    3   12
    [4,]    1    8   10
    attr(,"scaled:center")
    [1] 5.25 4.50 9.75
    attr(,"scaled:scale")
    [1] 3.50 2.65 2.63
    
    0 讨论(0)
  • 2020-11-28 07:32

    I am late to the party. But here is a useful tool to scale/unscale data in array format.

    Example:

    > (data <- array(1:8, c(2, 4)))            # create data
         [,1] [,2] [,3] [,4]
    [1,]    1    3    5    7
    [2,]    2    4    6    8
    > obj <- Scale(data)                       # create object
    > (data_scaled <- obj$scale(data))         # scale data
               [,1]       [,2]       [,3]       [,4]
    [1,] -0.7071068 -0.7071068 -0.7071068 -0.7071068
    [2,]  0.7071068  0.7071068  0.7071068  0.7071068
    > (obj$unscale(data_scaled))               # unscale scaled data
         [,1] [,2] [,3] [,4]
    [1,]    1    3    5    7
    [2,]    2    4    6    8
    
    ## scale or unscale another dataset
    ## using the same mean/sd parameters
    > (data2 <- array(seq(1, 24, 2), c(3, 4))) # create demo data
         [,1] [,2] [,3] [,4]
    [1,]    1    7   13   19
    [2,]    3    9   15   21
    [3,]    5   11   17   23
    > (data2_scaled <- obj$scale(data2))       # scale data
               [,1]      [,2]     [,3]     [,4]
    [1,] -0.7071068  4.949747 10.60660 16.26346
    [2,]  2.1213203  7.778175 13.43503 19.09188
    [3,]  4.9497475 10.606602 16.26346 21.92031
    > (obj$unscale(data2_scaled))              # unscale scaled data
         [,1] [,2] [,3] [,4]
    [1,]    1    7   13   19
    [2,]    3    9   15   21
    [3,]    5   11   17   23
    

    Function Scale():

    Scale <- function(data, margin=2, center=TRUE, scale=TRUE){
        stopifnot(is.array(data), is.numeric(data),
                  any(mode(margin) %in% c("integer", "numeric")),
                  length(margin) < length(dim(data)),
                  max(margin) <= length(dim(data)),
                  min(margin) >= 1,
                  !any(duplicated(margin)),
                  is.logical(center), length(center)==1,
                  is.logical(scale), length(scale)==1,
                      !(isFALSE(center) && isFALSE(scale)))
        margin <- as.integer(margin)
    
        m <- if(center) apply(data, 2, mean, na.rm=TRUE) else NULL
        s <- if(scale)  apply(data, 2, sd, na.rm=TRUE) else NULL
        ldim <- length(dim(data))
        cdim <- dim(data)[margin]
        data <- NULL # don't store the data
    
        Scale <- function(data){
            stopifnot(is.array(data), is.numeric(data),
                      length(dim(data)) == ldim,
                      dim(data)[margin] == cdim)
            if(center)
                data <- sweep(data, margin, m, `-`)
            if(scale)
                data <- sweep(data, margin, s, `/`)
            data
        }
    
        Unscale <- function(data){
            stopifnot(is.array(data), is.numeric(data),
                      length(dim(data)) == ldim,
                      dim(data)[margin] == cdim)
            if(scale)
                data <- sweep(data, margin, s, `*`)
            if(center)
                data <- sweep(data, margin, m, `+`)
            data
        }
        list(scale=Scale, unscale=Unscale, mean=m, sd=s)
    }
    

    Note: data.frames are not support yet.

    0 讨论(0)
  • 2020-11-28 07:44

    For a data frame or matrix:

    set.seed(1)
    x = matrix(sample(1:12), ncol= 3)
    xs = scale(x, center = TRUE, scale = TRUE)
    
    x.orig = t(apply(xs, 1, function(r)r*attr(xs,'scaled:scale') + attr(xs, 'scaled:center')))
    
    print(x)
         [,1] [,2] [,3]
    [1,]    4    2    3
    [2,]    5    7    1
    [3,]    6   10   11
    [4,]    9   12    8
    
    print(x.orig)
         [,1] [,2] [,3]
    [1,]    4    2    3
    [2,]    5    7    1
    [3,]    6   10   11
    [4,]    9   12    8
    

    Be careful when using functions like identical():

    print(x - x.orig)
         [,1] [,2]         [,3]
    [1,]    0    0 0.000000e+00
    [2,]    0    0 8.881784e-16
    [3,]    0    0 0.000000e+00
    [4,]    0    0 0.000000e+00
    
    identical(x, x.orig)
    # FALSE
    
    0 讨论(0)
提交回复
热议问题