How to do vlookup and fill down (like in Excel) in R?

前端 未结 8 857
悲&欢浪女
悲&欢浪女 2020-11-22 11:25

I have a dataset about 105000 rows and 30 columns. I have a categorical variable that I would like to assign it to a number. In Excel, I would probably do something with

8条回答
  •  轮回少年
    2020-11-22 11:53

    Solution #2 of @Ben's answer is not reproducible in other more generic examples. It happens to give the correct lookup in the example because the unique HouseType in houses appear in increasing order. Try this:

    hous <- read.table(header = TRUE,   stringsAsFactors = FALSE,   text="HouseType HouseTypeNo
      Semi            1
      ECIIsHome       17
      Single          2
      Row             3
      Single          2
      Apartment       4
      Apartment       4
      Row             3")
    
    largetable <- data.frame(HouseType = as.character(sample(unique(hous$HouseType), 1000, replace = TRUE)), stringsAsFactors = FALSE)
    lookup <- unique(hous)
    

    Bens solution#2 gives

    housenames <- as.numeric(1:length(unique(hous$HouseType)))
    names(housenames) <- unique(hous$HouseType)
    base2 <- data.frame(HouseType = largetable$HouseType,
                        HouseTypeNo = (housenames[largetable$HouseType]))
    

    which when

    unique(base2$HouseTypeNo[ base2$HouseType=="ECIIsHome" ])
    [1] 2
    

    when the correct answer is 17 from the lookup table

    The correct way to do it is

     hous <- read.table(header = TRUE,   stringsAsFactors = FALSE,   text="HouseType HouseTypeNo
          Semi            1
          ECIIsHome       17
          Single          2
          Row             3
          Single          2
          Apartment       4
          Apartment       4
          Row             3")
    
    largetable <- data.frame(HouseType = as.character(sample(unique(hous$HouseType), 1000, replace = TRUE)), stringsAsFactors = FALSE)
    
    housenames <- tapply(hous$HouseTypeNo, hous$HouseType, unique)
    base2 <- data.frame(HouseType = largetable$HouseType,
      HouseTypeNo = (housenames[largetable$HouseType]))
    

    Now the lookups are performed correctly

    unique(base2$HouseTypeNo[ base2$HouseType=="ECIIsHome" ])
    ECIIsHome 
           17
    

    I tried to edit Bens answer but it gets rejected for reasons I cannot understand.

提交回复
热议问题