Is there an easy way to tell if many data frames stored in one list contain the same columns?

妖精的绣舞 提交于 2019-12-10 18:49:10

问题


I have a list containing many data frames:

df1 <- data.frame(A = 1:5, B = 2:6, C = LETTERS[1:5])
df2 <- data.frame(A = 1:5, B = 2:6, C = LETTERS[1:5])
df3 <- data.frame(A = 1:5, C = LETTERS[1:5])
my_list <- list(df1, df2, df3)

I want to know if every data frame in this list contains the same columns (i.e., the same number of columns, all having the same names and in the same order).

I know that you can easily find column names of data frames in a list using lapply:

lapply(my_list, colnames)

Is there a way to determine if any differences in column names occur? I realize this is a complicated question involving pairwise comparisons.


回答1:


Here's another base solution with Reduce:

!is.logical(
  Reduce(function(x,y) if(identical(x,y)) x else FALSE
         , lapply(my_list, names)
         )
)

You could also account for same columns in a different order with

!is.logical(
  Reduce(function(x,y) if(identical(x,y)) x else FALSE
         , lapply(my_list, function(z) sort(names(z)))
         )
)

As for what's going on, Reduce() accumulates as it goes through the list. At first, identical(names_df1, names_df2) are evaluated. If it's true, we want to have it return the same vector evaluated! Then we can keep using it to compare to other members of the list.

Finally, if everything evaluates as true, we get a character vector returned. Since you probably want a logical output, !is.logical(...) is used to turn that character vector into a boolean.

See also here as I was very inspired by another post:

check whether all elements of a list are in equal in R

And a similar one that I saw after my edit:

Test for equality between all members of list




回答2:


You can avoid pairwise comparison by simply checking if the count of each column name is == length(my_list). This will simultaneously check for dim and names of you dataframe -

lapply(my_list, names) %>%
  unlist() %>% 
  table() %>% 
  all(. == length(my_list))

[1] FALSE

In base R i.e. without %>% -

all(table(unlist(lapply(my_list, names))) == length(my_list))

[1] FALSE

or sightly more optimized -

!any(table(unlist(lapply(my_list, names))) != length(my_list))



回答3:


We can use dplyr::bind_rows:

!any(is.na(dplyr::bind_rows(my_list)))

 # [1] FALSE



回答4:


Here is my answer:

k <- 1
output <- NULL
for(i in 1:(length(my_list) - 1)) {
 for(j in (i + 1):length(my_list)) {
  output[k] <- identical(colnames(my_list[[i]]), colnames(my_list[[j]]))
  k <- k + 1
 }
}
all(output)


来源:https://stackoverflow.com/questions/56821533/is-there-an-easy-way-to-tell-if-many-data-frames-stored-in-one-list-contain-the

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!