Efficiently convert a date column in data.table

后端 未结 4 1025
遇见更好的自我
遇见更好的自我 2020-12-16 21:13

I have a large data set with many columns containing dates in two different formats:

\"1996-01-04\" \"1996-01-05\" \"1996-01-08\" \"1996-01-09\" \"1996-01-10         


        
4条回答
  •  天命终不由人
    2020-12-16 22:00

    If there are any duplicated date fields in your dataset, then one way you could do is by setting up de-duplicated reference table then do the mapping on the smaller dataset. This will be faster than converting the date fields on all records.

    Data

    df <- data.frame(
      X1 = c("1996-01-04", "1996-01-05", "1996-01-08", "1996-01-09", "1996-01-10", rep("1996-01-11", 100)), 
      X2 = c("02/01/1996", "03/01/1996", "04/01/1996", "05/01/1996", "08/01/1996", rep("09/01/1996", 100)), 
      stringsAsFactors = FALSE)
    

    Create unique Date rows for mapping

    date_mapping <- function(date_col){
    
      ref_df <- data.frame(date1 = unique(date_col), stringsAsFactors = FALSE)
    
      if(all(grepl("/", ref_df$date1))) {
        ref_df$date2 <- as.Date(ref_df$date1, format = "%d/%m/%Y")
    
      } else {
        ref_df$date2 <- as.Date(ref_df$date1)  
      }
    
      date_col_mapped <- ref_df[match(date_col, ref_df$date1), "date2"]
    
      return(date_col_mapped)
    
    }
    
    
    date_mapping(df$X1)
    date_mapping(df$X2)
    

提交回复
热议问题