Is there an easy way to tell if many data frames stored in one list contain the same columns?

问题

I have a list containing many data frames:

df1 <- data.frame(A = 1:5, B = 2:6, C = LETTERS[1:5])
df2 <- data.frame(A = 1:5, B = 2:6, C = LETTERS[1:5])
df3 <- data.frame(A = 1:5, C = LETTERS[1:5])
my_list <- list(df1, df2, df3)

I want to know if every data frame in this list contains the same columns (i.e., the same number of columns, all having the same names and in the same order).

I know that you can easily find column names of data frames in a list using lapply:

lapply(my_list, colnames)

Is there a way to determine if any differences in column names occur? I realize this is a complicated question involving pairwise comparisons.

回答1:

Here's another base solution with Reduce:

!is.logical(
  Reduce(function(x,y) if(identical(x,y)) x else FALSE
         , lapply(my_list, names)
         )
)

You could also account for same columns in a different order with

!is.logical(
  Reduce(function(x,y) if(identical(x,y)) x else FALSE
         , lapply(my_list, function(z) sort(names(z)))
         )
)

As for what's going on, Reduce() accumulates as it goes through the list. At first, identical(names_df1, names_df2) are evaluated. If it's true, we want to have it return the same vector evaluated! Then we can keep using it to compare to other members of the list.

Finally, if everything evaluates as true, we get a character vector returned. Since you probably want a logical output, !is.logical(...) is used to turn that character vector into a boolean.

See also here as I was very inspired by another post:

check whether all elements of a list are in equal in R

And a similar one that I saw after my edit:

Test for equality between all members of list

回答2:

You can avoid pairwise comparison by simply checking if the count of each column name is == length(my_list). This will simultaneously check for dim and names of you dataframe -

lapply(my_list, names) %>%
  unlist() %>% 
  table() %>% 
  all(. == length(my_list))

[1] FALSE

In base R i.e. without %>% -

all(table(unlist(lapply(my_list, names))) == length(my_list))

[1] FALSE

or sightly more optimized -

!any(table(unlist(lapply(my_list, names))) != length(my_list))

回答3:

We can use dplyr::bind_rows:

!any(is.na(dplyr::bind_rows(my_list)))

 # [1] FALSE

回答4:

Here is my answer:

k <- 1
output <- NULL
for(i in 1:(length(my_list) - 1)) {
 for(j in (i + 1):length(my_list)) {
  output[k] <- identical(colnames(my_list[[i]]), colnames(my_list[[j]]))
  k <- k + 1
 }
}
all(output)

来源：https://stackoverflow.com/questions/56821533/is-there-an-easy-way-to-tell-if-many-data-frames-stored-in-one-list-contain-the

标签

lapply