find *all* duplicated records in data.table (not all-but-one)

前端 未结 4 573
Happy的楠姐
Happy的楠姐 2020-12-15 03:27

if I understand correctly, duplicated() function for data.table returns a logical vector which doesn\'t contain first occurrence of duplicated reco

4条回答
  •  感情败类
    2020-12-15 04:02

    As of data.table version 1.9.8, the solution by eddi needs to be modified to be:

    dups = duplicated(myDT, by = key(myDT));
    myDT[, fD := dups | c(tail(dups, -1), FALSE)]
    

    since:

    Changes in v1.9.8 (on CRAN 25 Nov 2016)

    POTENTIALLY BREAKING CHANGES

    By default all columns are now used by unique(), duplicated() and uniqueN() data.table methods, #1284 and #1841. To restore old behaviour: options(datatable.old.unique.by.key=TRUE). In 1 year this option to restore the old default will be deprecated with warning. In 2 years the option will be removed. Please explicitly pass by=key(DT) for clarity. Only code that relies on the default is affected. 266 CRAN and Bioconductor packages using data.table were checked before release. 9 needed to change and were notified. Any lines of code without test coverage will have been missed by these checks. Any packages not on CRAN or Bioconductor were not checked.

提交回复
热议问题