Keep before and after date of an external list

后端 未结 3 1211
旧巷少年郎
旧巷少年郎 2020-12-12 04:40

Having this dataframe:

dframe1 <- structure(list(id = c(1L, 1L, 1L, 2L, 2L), name = c(\"Google\", 
\"Yahoo\", \"Amazon\", \"Amazon\", \"Google\"), date =          


        
相关标签:
3条回答
  • 2020-12-12 05:05

    If I understand correctly, the OP wants to find matching entries on id, name and the day before or the day after. Therefore, a non-equi join will not help as it will include matches on the day itself.

    I suggest to perform two inner joins, one for the day before and a second for the day after using lapply(). Subsequently, the results are combined with rbindlist() which also adds a new column matching_day as requested by the OP:

    library(data.table)
    library(magrittr)
    setDT(dframe1)[, date := as.Date(date)]
    setDT(dframe2)[, date := as.Date(date)]
    
    lapply(
      c(-1, +1), 
      function(x) dframe2[dframe1[, .(id, name, date = date + x)], on = .(id, name, date), nomatch = 0L]
    ) %>%
      set_names(c("before", "after")) %>% 
      rbindlist(idcol = "matching_day") %>% 
      .[order(id)]
    
        matching_day id       date   name     text_sth
     1:       before  1 2008-10-31 Google another text
     2:       before  1 2008-10-31  Yahoo        other
     3:        after  1 2008-11-02 Google         test
     4:        after  1 2008-11-02 Google another text
     5:        after  1 2008-11-02  Yahoo     text_sth
     6:        after  1 2008-11-05 Amazon    text here
     7:       before  2 2008-10-31 Amazon          etc
     8:       before  2 2008-11-01 Google         test
     9:        after  2 2008-11-02 Amazon another text
    10:        after  2 2008-11-03 Google    text here
    
    0 讨论(0)
  • 2020-12-12 05:14

    A base R way could be to transform dframe1 into a data frame dframe1a that already consists of the desired dates and merge() with dframe2.

    dframe1a <- do.call(rbind, lapply(1:nrow(dframe1), function(m) 
      cbind(dframe1[m, -3], date=as.matrix(dframe1[m, "date"] + c(-1, 1)), row.names=NULL)))
    dframe1a$date <- as.Date(as.numeric(as.character(dframe1a$date)), origin="1970-01-01")
    merge(dframe2, dframe1a)
    #    id       date   name     text_sth
    # 1   1 2008-10-31 Google another text
    # 2   1 2008-10-31  Yahoo        other
    # 3   1 2008-11-02 Google another text
    # 4   1 2008-11-02 Google         test
    # 5   1 2008-11-02  Yahoo     text_sth
    # 6   1 2008-11-05 Amazon    text here
    # 7   2 2008-10-31 Amazon          etc
    # 8   2 2008-11-01 Google         test
    # 9   2 2008-11-02 Amazon another text
    # 10  2 2008-11-03 Google    text here
    

    Note: Of course your origin dates need to be formatted as such, e.g. dframe1$date <- as.Date(dframe1$date).

    0 讨论(0)
  • 2020-12-12 05:32

    One approach could be to expand dframe1 dataset and include rows with has +1 and -1 date for each id and name. We remove the original rows of dframe1 and do an inner_join with dframe2.

    library(dplyr)
    
    dframe1 %>%
      mutate(date = as.Date(date), date1 = date) %>%
      group_by(id, name) %>%
      tidyr::complete(date1 = seq(date1 - 1, date1 + 1, by = "1 day")) %>%
      filter(date1 != date | is.na(date)) %>%
      select(-date) %>%
      rename(date = 3) %>%
      inner_join(dframe2 %>% mutate(date = as.Date(date)))
    
    #Joining, by = c("id", "name", "date")
    # A tibble: 10 x 4
    # Groups:   id, name [5]
    #      id name   date       text_sth    
    #   <int> <chr>  <date>     <chr>       
    # 1     1 Amazon 2008-11-05 text here   
    # 2     1 Google 2008-10-31 another text
    # 3     1 Google 2008-11-02 test        
    # 4     1 Google 2008-11-02 another text
    # 5     1 Yahoo  2008-10-31 other       
    # 6     1 Yahoo  2008-11-02 text_sth    
    # 7     2 Amazon 2008-10-31 etc         
    # 8     2 Amazon 2008-11-02 another text
    # 9     2 Google 2008-11-01 test        
    #10     2 Google 2008-11-03 text here 
    

    To add a new columns we can add another mutate statement.

    dframe1 %>%
       mutate(date = as.Date(date), date1 = date) %>%
       group_by(id, name) %>%
       tidyr::complete(date1 = seq(date1 - 1, date1 + 1, by = "1 day")) %>%
       filter(date1 != date | is.na(date)) %>%
       select(-date) %>%
       mutate(col = c("before", "after")) %>%
       rename(date = 3) %>%
       inner_join(dframe2 %>% mutate(date = as.Date(date)))  
    
    0 讨论(0)
提交回复
热议问题