Finding ALL duplicate rows, including “elements with smaller subscripts”

后端 未结 7 814
借酒劲吻你
借酒劲吻你 2020-11-21 07:55

R\'s duplicated returns a vector showing whether each element of a vector or data frame is a duplicate of an element with a smaller subscript. So if rows 3, 4,

7条回答
  •  暗喜
    暗喜 (楼主)
    2020-11-21 08:37

    If you are interested in which rows are duplicated for certain columns you can use a plyr approach:

    ddply(df, .(col1, col2), function(df) if(nrow(df) > 1) df else c())
    

    Adding a count variable with dplyr:

    df %>% add_count(col1, col2) %>% filter(n > 1)  # data frame
    df %>% add_count(col1, col2) %>% select(n) > 1  # logical vector
    

    For duplicate rows (considering all columns):

    df %>% group_by_all %>% add_tally %>% ungroup %>% filter(n > 1)
    df %>% group_by_all %>% add_tally %>% ungroup %>% select(n) > 1
    

    The benefit of these approaches is that you can specify how many duplicates as a cutoff.

提交回复
热议问题