Find duplicated rows (based on 2 columns) in Data Frame in R

后端 未结 6 699
独厮守ぢ
独厮守ぢ 2020-11-27 06:10

I have a data frame in R which looks like:

| RIC    | Date                | Open   |
|--------|---------------------|--------|
| S1A.PA | 2011-06-30 20:00:00         


        
6条回答
  •  攒了一身酷
    2020-11-27 06:44

    Here's a dplyr option for tagging duplicates based on two (or more) columns. In this case ric and date:

    df <- data_frame(ric = c('S1A.PA', 'ABC.PA', 'EFG.PA', 'S1A.PA', 'ABC.PA', 'EFG.PA'),
                     date = c('2011-06-30 20:00:00', '2011-07-03 20:00:00', '2011-07-04 20:00:00', '2011-07-05 20:00:00', '2011-07-03 20:00:00', '2011-07-04 20:00:00'),
                     open = c(23.7, 24.31, 24.495, 24.23, 24.31, 24.495))
    
    df %>% 
      group_by(ric, date) %>% 
      mutate(dupe = n()>1)
    # A tibble: 6 x 4
    # Groups:   ric, date [4]
      ric    date                 open dupe 
                        
    1 S1A.PA 2011-06-30 20:00:00  23.7 FALSE
    2 ABC.PA 2011-07-03 20:00:00  24.3 TRUE 
    3 EFG.PA 2011-07-04 20:00:00  24.5 TRUE 
    4 S1A.PA 2011-07-05 20:00:00  24.2 FALSE
    5 ABC.PA 2011-07-03 20:00:00  24.3 TRUE 
    6 EFG.PA 2011-07-04 20:00:00  24.5 TRUE 
    

提交回复
热议问题