Delete duplicate columns?

旧街凉风 提交于 2021-01-28 01:10:57

问题


I am collating multiple excel files into one using data frames. There are duplicate columns in the files. Is it possible to merge only the unique columns?

Here is my code:

library(rJava)
library (XLConnect)

data.files = list.files(pattern = "*.xls")

# Read the first file
df = readWorksheetFromFile(file=data.files[1], sheet=1, check.names=F) 

# Loop through the remaining files and merge them to the existing data frame
for (file in data.files[-1]) {
newFile = readWorksheetFromFile(file=file, sheet=1, check.names=F)
    df = merge(df, newFile, all = TRUE, check.names=F)
} 

回答1:


First of all, if you apply merge correctly, there shouldn't be any duplicated columns, provided that the duplicated columns also have the exact same name in the EXCEL files. As you use merge, there must be at least one column in the EXCEL files that have the exact same name, and contains the values used to merge them.

So I reckon you want to check in the resulting data frame whether there are duplicate columns based on the values in each column. For this, you could use the following:

keepUnique <- function(x){
  combs <- combn(names(x),2)

  dups <- mapply(identical,
                 x[combs[1,]],
                 x[combs[2,]])

  drop <- combs[2,][dups]
  x[ !names(x) %in% drop ]
}

Which gives :

> mydf <- cbind(iris,iris[,3])[1:5,]
> mydf
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species iris[, 3]
1          5.1         3.5          1.4         0.2  setosa       1.4
2          4.9         3.0          1.4         0.2  setosa       1.4
3          4.7         3.2          1.3         0.2  setosa       1.3
4          4.6         3.1          1.5         0.2  setosa       1.5
5          5.0         3.6          1.4         0.2  setosa       1.4
> keepUnique(mydf)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa

You can use this after reading in a file, i.e. add the line

newFile <- keepUnique(newFile,df)

in your own code.



来源:https://stackoverflow.com/questions/22568946/delete-duplicate-columns

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!