Creating a new variable from a lookup table

后端 未结 4 782
佛祖请我去吃肉
佛祖请我去吃肉 2020-11-29 04:39

I have the following columns in my data set:

presult     aresult
  I         single
  I         double
  I         triple
  I         home run
  SS        st         


        
相关标签:
4条回答
  • 2020-11-29 05:17

    define your lookup table

    lookup= data.frame( 
            base=c(0,1,2,3,4), 
            aresult=c("strikeout","single","double","triple","home run"))
    

    then use join from plyr

    dataset = join(dataset,lookup,by='aresult')
    
    0 讨论(0)
  • 2020-11-29 05:18

    An alternative to Dieter's answer:

    dat <- data.frame(
      presult = c(rep("I", 4), "SS", "ZZ"),
      aresult = c("single", "double", "triple", "home run", "strikeout", "home run"),
      stringsAsFactors=FALSE
    )
    
    dat$base <- as.integer(factor(dat$aresult,
      levels=c("strikeout","single","double","triple","home run")))-1
    
    0 讨论(0)
  • 2020-11-29 05:40

    Here is how to use a named vector for the lookup:

    Define test data:

    dat <- data.frame(
        presult = c(rep("I", 4), "SS", "ZZ"),
        aresult = c("single", "double", "triple", "home run", "strikeout", "home run"),
        stringsAsFactors=FALSE
    )
    

    Define a named numeric vector with the scores:

    score <- c(single=1, double=2, triple=3, `home run`=4,  strikeout=0)
    

    Use vector indexing to match the scores against results:

    dat$base <- score[dat$aresult]
    dat
      presult   aresult base
    1       I    single    1
    2       I    double    2
    3       I    triple    3
    4       I  home run    4
    5      SS strikeout    0
    6      ZZ  home run    4
    

    Additional information:

    If you don't wish to construct the named vector by hand, say in the case where you have large amounts of data, then do it as follows:

    scores <- c(1:4, 5)
    names(scores) <- c("single", "double", "triple", "home run", "strikeout")
    

    (Or read the values and names from existing data. The point is to construct a numeric vector and then assign names.)

    0 讨论(0)
  • 2020-11-29 05:43
     dataset$base <- as.integer(as.factor(dataset$aresult))
    

    Depending on your data as.factor() could be omitted, because in many cases strings are factor by default, e.g. with read.table

    0 讨论(0)
提交回复
热议问题