Find duplicated rows (based on 2 columns) in Data Frame in R

后端 未结 6 698
独厮守ぢ
独厮守ぢ 2020-11-27 06:10

I have a data frame in R which looks like:

| RIC    | Date                | Open   |
|--------|---------------------|--------|
| S1A.PA | 2011-06-30 20:00:00         


        
6条回答
  •  [愿得一人]
    2020-11-27 06:35

    Easy way to get the information you want is to use dplyr.

    yourDF %>% 
      group_by(RIC, Date) %>% 
      mutate(num_dups = n(), 
             dup_id = row_number()) %>% 
      ungroup() %>% 
      mutate(is_duplicated = dup_id > 1)
    

    Using this:

    • num_dups tells you how many times that particular combo is duplicated
    • dup_id tells you which duplicate number that particular row is (e.g. 1st, 2nd, or 3rd, etc)
    • is_duplicated gives you an easy condition you can filter on later to remove all the duplicate rows (e.g. filter(!is_duplicated)), though you could also use dup_id for this (e.g. filter(dup_id == 1))

提交回复
热议问题