问题
So, I imported a dataset with 178 observations and 8 variables. Then end goal was to eliminate all observations that were the same across three of those variables (2, 5, and 6). This proved quite easy using the unique command.
mav2 <- unique(mav[,c(2,5,6)])
The resulting mav2
dataframe produced 55 observations, getting rid of all the duplicates! Unfortunately, it also got rid of the other five variables that I did not use in the unique command (1,3,4,7, and 8). I initially tried adding the two dataframes, of course this did not work since they were of unequal size. I have also tried merging the two, but this fails and just gives the an output of the first dataset with all 178 observations.
The second dataset (mav2
) did produce a new column (row.names
) which is the row number for each observation from the initial dataset.
If anyone could help me out on getting all 8 initial variables into a dataset with only the 55 unique observations, I would be very appreciative. Thanks in advance.
回答1:
I think what you want is duplicated
, a function similar to unique
that returns the indices of the duplicated elements.
So
mav2 <- mav[!duplicated(mav[,c(2,5,6)]),]
EDIT: inverted sense of duplicated
回答2:
You can try this
mav$key <- 1:nrow(mav)
mav2 <- unique(mav[,c(2,5,6)])
mav_unique <- mav[mav$key%in%mav2$key,]
mav_unique$key <- NULL
EDIT: to address the key issue
rownames(mav) <- 1:nrow(mav) #to make sure they are correctly set
mav2 <- unique(mav[,c(2,5,6)])
mav_unique <- mav[rownames(mav)%in%rownames(mav2),]
回答3:
You can try doing this.
mav[!(mav$v2==mav$v5 & mav$v5==mav$v6),]
Example:
mav <- data.frame(v1=c(1,2,3),v2=c(2,6,4),v3=c(4,5,6),v4=c(1,5,2),v5=c(5,6,7),v6=c(5,6,8),v7=c(7,4,5),v8=c(6,3,1))
mav
v1 v2 v3 v4 v5 v6 v7 v8
1 1 2 4 1 5 5 7 6
2 2 6 5 5 6 6 4 3
3 3 4 6 2 7 8 5 1
Now in the above dataframe, 2nd row in the columns v2,v5,v6 has same value 6.
Do the following.
mav <- mav[!(mav$v2==mav$v5 & mav$v5==mav$v6),]
gives you
mav
v1 v2 v3 v4 v5 v6 v7 v8
1 1 2 4 1 5 5 7 6
3 3 4 6 2 7 8 5 1
retains all the other columns.
来源:https://stackoverflow.com/questions/31148152/having-trouble-keeping-all-variables-after-removing-duplicates-from-a-dataset