r compare column types between two dataframes

假如想象 提交于 2019-12-08 06:03:52

问题


This may be a bad question because I am not posting any reproducible example. My main goal is to identify columns that are of different types between two dataframe that have the same column names.

For example

df1

 Id      Col1      Col2     Col3
 Numeric Factor    Integer  Date

df2

 Id      Col1      Col2     Col3
 Numeric Numeric    Integer  Date

Here both the dataframes (df1, df2) have same column names but the Col1 type is different and I am interested in identifying such columns. Expected output.

Col1  Factor    Numeric

Any suggestions or tips on achieving this ?. Thanks


回答1:


Try this:

compareColumns <- function(df1, df2) {
  commonNames <- names(df1)[names(df1) %in% names(df2)]
  data.frame(Column = commonNames,
             df1 = sapply(df1[,commonNames], class),
             df2 = sapply(df2[,commonNames], class)) }



回答2:


For a more compact method, you could use a list with sapply(). Efficiency shouldn't be a problem here since all we're doing is grabbing the class. Here I add data frame names to the list to create a more clear output.

m <- sapply(list(df1 = df1, df2 = df2), sapply, class)
m[m[, "df1"] != m[, "df2"], , drop = FALSE]
#      df1      df2        
# Col1 "factor" "character"

where df1 and df2 are the data from @ycw's answer.




回答3:


If two data frame have same column names, then below will give you columns with different classes.

library(dplyr)
m1 = mtcars
m2 = mtcars %>% mutate(cyl = factor(cyl), vs = factor(cyl))
out = cbind(sapply(m1, class), sapply(m2, class))
out[apply(out, 1, function(x) !identical(x[1], x[2])), ]



回答4:


Try compare_df_cols() from the janitor package:

library(janitor)
mtcars2 <- mtcars
mtcars2$cyl <- as.character(mtcars2$cyl)
compare_df_cols(mtcars, mtcars2, return = "mismatch")

#>   column_name  mtcars   mtcars2
#> 1         cyl numeric character

Self-promotion alert, I authored this package - am posting this function because it exists to solve precisely this problem.




回答5:


We can use sapply with class to loop through all columns in df1 and df2. After that, we can compare the results.

# Create example data frames
df1 <- data.frame(ID = 1:3,
                  Col1 = as.character(2:4),
                  Col2 = 2:4,
                  Col3 = as.Date(paste0("2017-01-0", 2:4)))

df2 <- data.frame(ID = 1:3,
                  Col1 = as.character(2:4),
                  Col2 = 2:4,
                  Col3 = as.Date(paste0("2017-01-0", 2:4)),
                  stringsAsFactors = FALSE)

# Use sapply and class to find out all the class
class1 <- sapply(df1, class)
class2 <- sapply(df2, class)

# Combine the results, then filter for rows that are different
result <- data.frame(class1, class2, stringsAsFactors = FALSE)
result[!(result$class1 == result$class2), ]
     class1    class2
Col1 factor character


来源:https://stackoverflow.com/questions/45743991/r-compare-column-types-between-two-dataframes

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!