unique() for more than one variable

后端 未结 4 1189
北荒
北荒 2020-11-29 02:25

I have the following data frame in R:

> str(df)
\'data.frame\':   545227 obs. of  15 variables:
 $ ykod : int  93 93 93 93 93 93 93 93 93 93 ...
 $ yad  :         


        
4条回答
  •  孤街浪徒
    2020-11-29 03:22

    This is an addition to Josh's answer.

    You can also keep the values of other variables while filtering out duplicated rows in data.table

    Example:

    library(data.table)
    
    #create data table
    dt <- data.table(
      V1=LETTERS[c(1,1,1,1,2,3,3,5,7,1)],
      V2=LETTERS[c(2,3,4,2,1,4,4,6,7,2)],
      V3=c(1),
      V4=c(2) )
    
    > dt
    # V1 V2 V3 V4
    # A  B  1  2
    # A  C  1  2
    # A  D  1  2
    # A  B  1  2
    # B  A  1  2
    # C  D  1  2
    # C  D  1  2
    # E  F  1  2
    # G  G  1  2
    # A  B  1  2
    
    # set the key to all columns
    setkey(dt)
    
    # Get Unique lines in the data table
    unique( dt[list(V1, V2), nomatch = 0] ) 
    
    # V1 V2 V3 V4
    # A  B  1  2
    # A  C  1  2
    # A  D  1  2
    # B  A  1  2
    # C  D  1  2
    # E  F  1  2
    # G  G  1  2
    

    Alert: If there are different combinations of values in the other variables, then your result will be

    unique combination of V1 and V2

提交回复
热议问题