R: Remove duplicates from a dataframe based on categories in a column

前端 未结 7 1336
耶瑟儿~
耶瑟儿~ 2021-02-15 16:14

Here is my example data set:

      Name Course Cateory
 1: Jason     ML      PT
 2: Jason     ML      DI
 3: Jason     ML      GT
 4: Jason     ML      SY
 5: Ja         


        
7条回答
  •  情深已故
    2021-02-15 16:35

    You'll need to create an index to represent the order of category. Then sort based on the priority of your categories and dedup by Name and Course.

    library(tidyverse)
    
    #create index to sort by
    index.df <- data.frame("Cateory" = c('PT',"DI","GT","SY"), "Index" = c(1,2,3,4))
    
    #join to orig dataset
    data <- left_join(data, index.df, by = "Cateory")
    
    #sort by index, dedup with Name and Course
    data %>% arrange(Index) %>% group_by(Name,Course) %>% 
    distinct(Name,Course, .keep_all = TRUE) %>% select(-Index)
    

提交回复
热议问题