Reduce() in R over similar variable names causing error

萝らか妹 提交于 2019-12-10 20:11:14

问题


I have 19 nested lists generated from a lapply and split operation. These lists are in the form:

#list1
Var col1 col2 col3
A    2     3    4
B    3     4    5

#list2
Var col1 col2 col3

A    5    6     7
B    5    4     4

......

#list19

Var col1 col2 col3

A    3   6    7
B    7   4    4

I have been able to merge the lists with

merge.all <- function(x, y) merge(x, y, all=TRUE, by="Var")
out <- Reduce(merge.all, DataList)

I am however getting an error due to the similarity in the names of the other columns.

How can I concatenate the name of the list to the variable names so that I get something like this:

Var list1.col1 list1.col2 list1.col3  ..........   list19.col3
 A    2          3          4                            7 
 B    3          4          5          ..........        4

回答1:


I'm really sure somebody will come up with a much, much better solution. However, if you're after a quick and dirty solution, this seems to work.

My plan was to simply change the column names prior to merging.

#Sample Data
df1 <- data.frame(Var = c("A","B"), col1 = c(2,3), col2 = c(3,4), col3 = c(4,5))
df2 <- data.frame(Var = c("A","B"), col1 = c(5,5), col2 = c(6,4), col3 = c(7,5))
df19 <- data.frame(Var = c("A","B"), col1 = c(3,7), col2 = c(6,4), col3 = c(7,4))

mylist <- list(df1, df2, df19)
names(mylist) <- c("df1", "df2", "df19") #just manually naming, presumably your list has names


## Change column names by pasting name of dataframe in list with standard column names. - using ugly mix of `lapply` and a `for` loop:

mycolnames <- colnames(df1)
mycolnames1 <- lapply(names(mylist), function(x) paste0(x, mycolnames)) 


for(i in 1:length(mylist)){
  colnames(mylist[[i]]) <- mycolnames1[[i]]
  colnames(mylist[[i]])[1] <- "Var" #put Var back in so you can merge
}



## Merge
merge.all <- function(x, y)
  merge(x, y, all=TRUE, by="Var")

out <- Reduce(merge.all, mylist)
out


#  Var df1col1 df1col2 df1col3 df2col1 df2col2 df2col3 df19col1 df19col2 df19col3
#1   A       2       3       4       5       6       7        3        6        7
#2   B       3       4       5       5       4       5        7        4        4

There you go - it works but is very ugly.




回答2:


To set the data frame names unique, you could use a function to set all list names that are not the merging variable to unique names.

resetNames <- function(x, byvar = "Var") {
    asrl <- as.relistable(lapply(x, names))
    allnm <- names(unlist(x, recursive = FALSE))
    rpl <- replace(allnm, unlist(asrl) %in% byvar, byvar)
    Map(setNames, x, relist(rpl, asrl))
}

Reduce(merge.all, resetNames(dlist))
#  Var list1.col1 list1.col2 list1.col3 list2.col1 list2.col2 list2.col4 list3.col1
#1   A          2          3          4          5          6          7          3
#2   B          3          4          5          5          4          4          7
#  list3.col2 list3.col3 list4.col1 list4.col2 list4.col3
#1          6          7          3          6          7
#2          4          4          4          5          6

when run your list with an added data frame there are no warnings. And there's always data table. Its merge method does not return a warning for duplicated column names.

library(data.table)
Reduce(merge.all, lapply(dlist, as.data.table))

Another option is to check the names as the data enters the function, change them there, and then you can avoid the warning. This isn't perfect but it works ok here.

merge.all <- function(x, y) {
    m <- match(names(y)[-1], gsub("[.](x|y)$", "", names(x)[-1]), 0L)
    names(y)[-1][m] <- paste0(names(y)[-1][m], "DUPE")
    merge(x, y, all=TRUE, by="Var")
}

rm <- Reduce(merge.all, dlist)
names(rm)
#  [1] "Var"        "col1"       "col2"       "col3"       "col1DUPE.x"
#  [6] "col2DUPE.x" "col4"       "col1DUPE.y" "col2DUPE.y" "col3DUPE.x"
# [11] "col1DUPE"   "col2DUPE"   "col3DUPE.y"

where dlist is

structure(list(list1 = structure(list(Var = structure(1:2, .Label = c("A", 
"B"), class = "factor"), col1 = 2:3, col2 = 3:4, col3 = 4:5), .Names = c("Var", 
"col1", "col2", "col3"), class = "data.frame", row.names = c(NA, 
-2L)), list2 = structure(list(Var = structure(1:2, .Label = c("A", 
"B"), class = "factor"), col1 = c(5L, 5L), col2 = c(6L, 4L), 
    col4 = c(7L, 4L)), .Names = c("Var", "col1", "col2", "col4"
), class = "data.frame", row.names = c(NA, -2L)), list3 = structure(list(
    Var = structure(1:2, .Label = c("A", "B"), class = "factor"), 
    col1 = c(3L, 7L), col2 = c(6L, 4L), col3 = c(7L, 4L)), .Names = c("Var", 
"col1", "col2", "col3"), class = "data.frame", row.names = c(NA, 
-2L)), list4 = structure(list(Var = structure(1:2, .Label = c("A", 
"B"), class = "factor"), col1 = 3:4, col2 = c(6L, 5L), col3 = c(7L, 
6L)), .Names = c("Var", "col1", "col2", "col3"), row.names = c(NA, 
-2L), class = "data.frame")), .Names = c("list1", "list2", "list3", 
"list4"))


来源:https://stackoverflow.com/questions/28378637/reduce-in-r-over-similar-variable-names-causing-error

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!