Recoding variables with R

后端 未结 5 677
长情又很酷
长情又很酷 2020-11-28 04:36

Recoding variables in R, seems to be my biggest headache. What functions, packages, processes do you use to ensure the best result?

I\'ve found very few useful exam

相关标签:
5条回答
  • 2020-11-28 05:09

    I've found that it can sometimes be easier to convert non numeric factors to character before attempting to change them, for example.

    df <- data.frame(example=letters[1:26]) 
    example <- as.character(df$example)
    example[example %in% letters[1:20]] <- "a"
    example[example %in% letters[21:26]] <- "b"
    

    Also, when importing data, it can be useful to ensure that numbers are actually numeric before attempting to convert:

    df <- data.frame(example=1:100)
    example <- as.numeric(df$example)
    example[example < 20] <- 1
    example[example >= 20 & example < 80] <- 2
    example[example >= 80] <- 3
    
    0 讨论(0)
  • 2020-11-28 05:10

    When you want to recode levels of a factor, forcats might come in handy. You can read a chapter of R for Data Science for an extensive tutorial, but here is the gist of it.

    library(tidyverse)
    library(forcats)
    gss_cat %>%
      mutate(partyid = fct_recode(partyid,
                               "Republican, strong"    = "Strong republican",
                               "Republican, weak"      = "Not str republican",
                               "Independent, near rep" = "Ind,near rep",
                               "Independent, near dem" = "Ind,near dem",
                               "Democrat, weak"        = "Not str democrat",
                               "Democrat, strong"      = "Strong democrat",
                               "Other"                 = "No answer",
                               "Other"                 = "Don't know",
                               "Other"                 = "Other party"
      )) %>%
      count(partyid)
    #> # A tibble: 8 × 2
    #>                 partyid     n
    #>                  <fctr> <int>
    #> 1                 Other   548
    #> 2    Republican, strong  2314
    #> 3      Republican, weak  3032
    #> 4 Independent, near rep  1791
    #> 5           Independent  4119
    #> 6 Independent, near dem  2499
    #> # ... with 2 more rows
    

    You can even let R decide what categories (factor levels) to merge together.

    Sometimes you just want to lump together all the small groups to make a plot or table simpler. That’s the job of fct_lump(). [...] The default behaviour is to progressively lump together the smallest groups, ensuring that the aggregate is still the smallest group.

    gss_cat %>%
      mutate(relig = fct_lump(relig, n = 10)) %>%
      count(relig, sort = TRUE) %>%
      print(n = Inf)
    #> # A tibble: 2 × 2
    #>        relig     n
    #>       <fctr> <int>
    #> 1 Protestant 10846
    #> 2      Other 10637
    
    0 讨论(0)
  • 2020-11-28 05:15

    I find this very convenient when several values should be transformed (its like doing recodes in Stata):

    # load package and gen some data
    require(car)
    x <- 1:10
    
    # do the recoding
    x
    ## [1]   1   2   3   4   5   6   7   8   9  10
    
    recode(x,"10=1; 9=2; 1:4=-99")
    ## [1] -99 -99 -99 -99   5   6   7   8   2   1
    
    0 讨论(0)
  • 2020-11-28 05:35

    Recoding can mean a lot of things, and is fundamentally complicated.

    Changing the levels of a factor can be done using the levels function:

    > #change the levels of a factor
    > levels(veteran$celltype) <- c("s","sc","a","l")
    

    Transforming a continuous variable simply involves the application of a vectorized function:

    > mtcars$mpg.log <- log(mtcars$mpg) 
    

    For binning continuous data look at cut and cut2 (in the hmisc package). For example:

    > #make 4 groups with equal sample sizes
    > mtcars[['mpg.tr']] <- cut2(mtcars[['mpg']], g=4)
    > #make 4 groups with equal bin width
    > mtcars[['mpg.tr2']] <- cut(mtcars[['mpg']],4, include.lowest=TRUE)
    

    For recoding continuous or factor variables into a categorical variable there is recode in the car package and recode.variables in the Deducer package

    > mtcars[c("mpg.tr2")] <- recode.variables(mtcars[c("mpg")] , "Lo:14 -> 'low';14:24 -> 'mid';else -> 'high';")
    

    If you are looking for a GUI, Deducer implements recoding with the Transform and Recode dialogs:

    http://www.deducer.org/pmwiki/pmwiki.php?n=Main.TransformVariables

    http://www.deducer.org/pmwiki/pmwiki.php?n=Main.RecodeVariables

    0 讨论(0)
  • 2020-11-28 05:36

    I found mapvalues from plyr package very handy. Package also contains function revalue which is similar to car:::recode.

    The following example will "recode"

    > mapvalues(letters, from = c("r", "o", "m", "a", "n"), to = c("R", "O", "M", "A", "N"))
     [1] "A" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "M" "N" "O" "p" "q" "R" "s" "t" "u" "v" "w" "x" "y" "z"
    
    0 讨论(0)
提交回复
热议问题