问题
This question already has an answer here:
- regarding matrix comparison in R 1 answer
Say I have large datasets in R and I just want to know whether two of them they are the same. I use this often when I'm experimenting different algorithms to achieve the same result. For example, say we have the following datasets:
df1 <- data.frame(num = 1:5, let = letters[1:5])
df2 <- df1
df3 <- data.frame(num = c(1:5, NA), let = letters[1:6])
df4 <- df3
So this is what I do to compare them:
table(x == y, useNA = 'ifany')
Which works great when the datasets have no NAs:
> table(df1 == df2, useNA = 'ifany')
TRUE
10
But not so much when they have NAs:
> table(df3 == df4, useNA = 'ifany')
TRUE <NA>
11 1
In the example, it's easy to dismiss the NA
as not a problem since we know that both dataframes are equal. The problem is that NA == <anything>
yields NA
, so whenever one of the datasets has an NA
, it doesn't matter what the other one has on that same position, the result is always going to be NA
.
So using table()
to compare datasets doesn't seem ideal to me. How can I better check if two data frames are identical?
P.S.: Note this is not a duplicate of R - comparing several datasets, Comparing 2 datasets in R or Compare datasets in R
回答1:
Look up all.equal. It has some riders but it might work for you.
all.equal(df3,df4)
# [1] TRUE
all.equal(df2,df1)
# [1] TRUE
回答2:
As Metrics pointed out, one could also use identical()
to compare the datasets. The difference between this approach and that of Codoremifa is that identical()
will just yield TRUE
of FALSE
, depending whether the objects being compared are identical or not, whereas all.equal()
will either return TRUE
or hints about the differences between the objects. For instance, consider the following:
> identical(df1, df3)
[1] FALSE
> all.equal(df1, df3)
[1] "Attributes: < Component 2: Numeric: lengths (5, 6) differ >"
[2] "Component 1: Numeric: lengths (5, 6) differ"
[3] "Component 2: Lengths: 5, 6"
[4] "Component 2: Attributes: < Component 2: Lengths (5, 6) differ (string compare on first 5) >"
[5] "Component 2: Lengths (5, 6) differ (string compare on first 5)"
Moreover, from what I've tested identical()
seems to run much faster than all.equal()
.
来源:https://stackoverflow.com/questions/19119320/how-to-check-if-two-data-frames-are-equal