How to do vlookup and fill down (like in Excel) in R?

前端 未结 8 858
悲&欢浪女
悲&欢浪女 2020-11-22 11:25

I have a dataset about 105000 rows and 30 columns. I have a categorical variable that I would like to assign it to a number. In Excel, I would probably do something with

8条回答
  •  慢半拍i
    慢半拍i (楼主)
    2020-11-22 11:45

    Using merge is different from lookup in Excel as it has potential to duplicate (multiply) your data if primary key constraint is not enforced in lookup table or reduce the number of records if you are not using all.x = T.

    To make sure you don't get into trouble with that and lookup safely, I suggest two strategies.

    First one is to make a check on a number of duplicated rows in lookup key:

    safeLookup <- function(data, lookup, by, select = setdiff(colnames(lookup), by)) {
      # Merges data to lookup making sure that the number of rows does not change.
      stopifnot(sum(duplicated(lookup[, by])) == 0)
      res <- merge(data, lookup[, c(by, select)], by = by, all.x = T)
      return (res)
    }
    

    This will force you to de-dupe lookup dataset before using it:

    baseSafe <- safeLookup(largetable, house.ids, by = "HouseType")
    # Error: sum(duplicated(lookup[, by])) == 0 is not TRUE 
    
    baseSafe<- safeLookup(largetable, unique(house.ids), by = "HouseType")
    head(baseSafe)
    # HouseType HouseTypeNo
    # 1 Apartment           4
    # 2 Apartment           4
    # ...
    

    Second option is to reproduce Excel behaviour by taking the first matching value from the lookup dataset:

    firstLookup <- function(data, lookup, by, select = setdiff(colnames(lookup), by)) {
      # Merges data to lookup using first row per unique combination in by.
      unique.lookup <- lookup[!duplicated(lookup[, by]), ]
      res <- merge(data, unique.lookup[, c(by, select)], by = by, all.x = T)
      return (res)
    }
    
    baseFirst <- firstLookup(largetable, house.ids, by = "HouseType")
    

    These functions are slightly different from lookup as they add multiple columns.

提交回复
热议问题