How to remove columns from a data.frame by data type?

后端 未结 2 1994
误落风尘
误落风尘 2020-12-06 12:35

I have a data.frame with almost 200 variables (columns) and different type of data (num, int, logi, factor). Now, I would like to remove all the variables of the type \"fact

相关标签:
2条回答
  • 2020-12-06 12:51

    Here's a very useful tidyverse solution, adapted from here:

    library(lubridate)
    #> 
    #> Attaching package: 'lubridate'
    #> The following object is masked from 'package:base':
    #> 
    #>     date
    library(tidyverse)
    
    # Create dummy dataset with multiple variable types
    df <- 
      tibble::tribble(
      ~var_num_1, ~var_num_2,   ~var_char, ~var_fct, ~var_date,
               1,         10,      "this",   "THIS", "2019-12-18",
               2,         20,        "is",     "IS", "2019-12-19",
               3,         30,     "dummy",  "DUMMY", "2019-12-20",
               4,         40, "character", "FACTOR", "2019-12-21",
               5,         50,      "text",   "TEXT", "2019-12-22"
      ) %>% 
      mutate(
        var_fct = as_factor(var_fct),
        var_date = as_date(var_date)
      )
    
    
    # Select numeric variables
    df %>% select_if(is.numeric)
    #> # A tibble: 5 x 2
    #>   var_num_1 var_num_2
    #>       <dbl>     <dbl>
    #> 1         1        10
    #> 2         2        20
    #> 3         3        30
    #> 4         4        40
    #> 5         5        50
    
    # Select character variables
    df %>% select_if(is.character)
    #> # A tibble: 5 x 1
    #>   var_char 
    #>   <chr>    
    #> 1 this     
    #> 2 is       
    #> 3 dummy    
    #> 4 character
    #> 5 text
    
    # Select factor variables
    df %>% select_if(is.factor)
    #> # A tibble: 5 x 1
    #>   var_fct
    #>   <fct>  
    #> 1 THIS   
    #> 2 IS     
    #> 3 DUMMY  
    #> 4 FACTOR 
    #> 5 TEXT
    
    # Select date variables
    df %>% select_if(is.Date)
    #> # A tibble: 5 x 1
    #>   var_date  
    #>   <date>    
    #> 1 2019-12-18
    #> 2 2019-12-19
    #> 3 2019-12-20
    #> 4 2019-12-21
    #> 5 2019-12-22
    
    # Select variables using negation (note the use of `~`)
    df %>% select_if(~!is.numeric(.))
    #> # A tibble: 5 x 3
    #>   var_char  var_fct var_date  
    #>   <chr>     <fct>   <date>    
    #> 1 this      THIS    2019-12-18
    #> 2 is        IS      2019-12-19
    #> 3 dummy     DUMMY   2019-12-20
    #> 4 character FACTOR  2019-12-21
    #> 5 text      TEXT    2019-12-22
    

    Created on 2019-12-18 by the reprex package (v0.3.0)

    0 讨论(0)
  • 2020-12-06 12:56

    Assuming a generic data.frame this will remove columns of type factor

    df[,-which(sapply(df, class) == "factor")]
    

    EDIT

    As per @Roland's suggestion, you can also just keep those which are not factor. Whichever you prefer.

    df[, sapply(df, class) != "factor"]
    

    EDIT 2

    As you are concerned with the cor function, @Ista also points out that it would be safer in that particular instance to filter on is.numeric. The above are only to remove factor types.

    df[,sapply(df, is.numeric)]
    
    0 讨论(0)
提交回复
热议问题