How to output duplicated rows

廉价感情. 提交于 2019-12-17 14:58:15

问题


I have the following data:

x1  x2  x3  x4
34  14  45  53 
2   8   18  17
34  14  45  20
19  78  21  48 
2   8   18  5

In rows 1 and 3; and 2 and 5 the values for columns X1;X2,X3 are equal. How can I output only those 4 rows, with equal numbers? The output should be in the following format:

x1  x2  x3  x4
34  14  45  53
34  14  45  20
2   8   18  17
2   8   18  5

Please, ask me questions if something unclear.

ADDITIONAL QUESTION: in the output

x1  x2  x3  x4
34  14  45  53
34  14  45  20
2   8   18  17
2   8   18  5

find the sum of values in last column:

x1  x2  x3  x4
34  14  45  73
2   8   18  22

回答1:


You can do this with duplicated, which checks for rows being duplicated when passed a matrix. Since you're only checking the first three columns, you should pass dat[,-4] to the function.

dat[duplicated(dat[,-4]) | duplicated(dat[,-4], fromLast=T),]
#   x1 x2 x3 x4
# 1 34 14 45 53
# 2  2  8 18 17
# 3 34 14 45 20
# 5  2  8 18  5



回答2:


An alternative using ave:

dat[ave(dat[,1], dat[-4], FUN=length) > 1,]

#  x1 x2 x3 x4
#1 34 14 45 53
#2  2  8 18 17
#3 34 14 45 20
#5  2  8 18  5



回答3:


Learned this one the other day. You won't need to re-order the output.

s <- split(dat, do.call(paste, dat[-4]))
Reduce(rbind, Filter(function(x) nrow(x) > 1, s))
#   x1 x2 x3 x4
# 2  2  8 18 17
# 5  2  8 18  5
# 1 34 14 45 53
# 3 34 14 45 20



回答4:


There is another way to solve both questions using two packages.

library(DescTools)
library(dplyr)
dat[AllDuplicated(dat[1:3]), ] %>% # this line is to find duplicates
  group_by(x1, x2) %>% # the lines followed are to sum up
  mutate(x4 = sum(x4)) %>%
  unique()
# Source: local data frame [2 x 4]
# Groups: x1, x2
# 
#   x1 x2 x3 x4
# 1 34 14 45 73
# 2  2  8 18 22



回答5:


Can also use table command:

> d1 = ddf[ddf$x1 %in% ddf$x1[which(table(ddf$x1)>1)],]
> d2 = ddf[ddf$x2 %in% ddf$x2[which(table(ddf$x2)>1)],]
> rr = rbind(d1, d2)
> rr[!duplicated(rbind(d1, d2)),]
  x1 x2 x3 x4
1 34 14 45 53
3 34 14 45 20
2  2  8 18 17
5  2  8 18  5

For sum in last column:

> rrt = data.table(rr2)
> rrt[,x4:=sum(x4),by=x1]
> rrt[rrt[,!duplicated(x1),]]
   x1 x2 x3 x4
1: 34 14 45 73
2:  2  8 18 22



回答6:


first one similar as above, let z be your data.frame:

 library(DescTools)
 (zz <- Sort(z[AllDuplicated(z[, -4]), ], decreasing=TRUE) )

 # now aggregate
 aggregate(zz[, 4], zz[, -4], FUN=sum)

 # use Sort again, if needed...


来源:https://stackoverflow.com/questions/26520828/how-to-output-duplicated-rows

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!