Trying to use dplyr to group_by and apply scale()

前端 未结 2 609
广开言路
广开言路 2020-12-05 23:54

Trying to use dplyr to group_by the stud_ID variable in the following data frame, as in this SO question:

> str(df)         


        
相关标签:
2条回答
  • 2020-12-06 00:20

    This was a known problem in dplyr, a fix has been merged to the development version, which you can install via

    # install.packages("devtools")
    devtools::install_github("hadley/dplyr")
    

    In the stable version, the following should work, too:

    scale_this <- function(x) as.vector(scale(x))
    
    0 讨论(0)
  • 2020-12-06 00:29

    The problem seems to be in the base scale() function, which expects a matrix. Try writing your own.

    scale_this <- function(x){
      (x - mean(x, na.rm=TRUE)) / sd(x, na.rm=TRUE)
    }
    

    Then this works:

    library("dplyr")
    
    # reproducible sample data
    set.seed(123)
    n = 1000
    df <- data.frame(stud_ID = sample(LETTERS, size=n, replace=TRUE),
                     behavioral_scale = runif(n, 0, 10),
                     cognitive_scale = runif(n, 1, 20),
                     affective_scale = runif(n, 0, 1) )
    scaled_data <- 
      df %>%
      group_by(stud_ID) %>%
      mutate(behavioral_scale_ind = scale_this(behavioral_scale),
             cognitive_scale_ind = scale_this(cognitive_scale),
             affective_scale_ind = scale_this(affective_scale))
    

    Or, if you're open to a data.table solution:

    library("data.table")
    
    setDT(df)
    
    cols_to_scale <- c("behavioral_scale","cognitive_scale","affective_scale")
    
    df[, lapply(.SD, scale_this), .SDcols = cols_to_scale, keyby = factor(stud_ID)] 
    
    0 讨论(0)
提交回复
热议问题