Merge two data frames to get alternate rows of each data frame in sequence

谁说我不能喝 提交于 2021-02-07 06:32:17

问题


My data.frame DATA is

  k    l   g
1 A 2004  12
2 B 2004 3.4
3 C 2004 4.5

Another data.frame DATA2 is

  i    d   t
1 A 2012  22
2 B 2012 4.8
3 C 2012 5.6

I want to get

1 A 2004  12
1 A 2012  22
2 B 2004 3.4
2 B 2012 4.8
3 C 2004 4.5
3 C 2012 5.6

回答1:


We can try rbindlist from data.table. Place the datasets in a list, rbind them with rbindlist and order by the first column.

library(data.table)
rbindlist(list(df1, df2))[order(k)]
#   k    l    g
#1: A 2004 12.0
#2: A 2012 22.0
#3: B 2004  3.4
#4: B 2012  4.8
#5: C 2004  4.5
#6: C 2012  5.6

Or using dplyr

library(dplyr)
bind_rows(df1, setNames(df2, names(df1))) %>% 
           arrange(k)

NOTE: I used df1 and df2 in place of DATA and DATA2 as object names as it is easier to type.




回答2:


You can try the interleave function from the "gdata" package. However, this would require your inputs have the same column names and have the same number of rows.

The approach would be:

library(gdata)      # for interleave
do.call(interleave, lapply(list(df1, df2), setNames, paste0("V", 1:ncol(df1))))
#    V1   V2   V3
# 1   A 2004 12.0
# 11  A 2012 22.0
# 2   B 2004  3.4
# 21  B 2012  4.8
# 3   C 2004  4.5
# 31  C 2012  5.6

Alternatively, as mentioned in my comment @akrun's answer, depending on whether the first column is a grouping variable or not, you may want to modify his approach a little.

For instance, imagine there were a third data.frame, with a different number of rows than the others. interleave would not work on that, but the rbindlist approach would.

df3 <- do.call(rbind, lapply(list(df1, df2), setNames, c("A", "B", "Z")))

rbindlist(list(df1, df2, df3), idcol = TRUE)[, N := sequence(.N), by = .id][order(N)]
#     .id k    l    g N
#  1:   1 A 2004 12.0 1
#  2:   2 A 2012 22.0 1
#  3:   3 A 2004 12.0 1
#  4:   1 B 2004  3.4 2
#  5:   2 B 2012  4.8 2
#  6:   3 B 2004  3.4 2
#  7:   1 C 2004  4.5 3
#  8:   2 C 2012  5.6 3
#  9:   3 C 2004  4.5 3
# 10:   3 A 2012 22.0 4
# 11:   3 B 2012  4.8 5
# 12:   3 C 2012  5.6 6

Pay specific attention to the last three rows in comparison with @akrun's approach.


The equivalent in base R for that last "data.table" approach would be something like:

x <- do.call(rbind, lapply(c("df1", "df2", "df3"), function(x) {
  setNames(cbind(rn = x, get(x)), c("id", paste0("V", 1:ncol(get(x)))))
}))
x[order(ave(as.numeric(x$id), x$id, FUN = seq_along)), ]

(So the moral is, use "data.table".)




回答3:


You can also do this in base R with rbind without the need to use extra packages, but you will have to set the columnnames of df2 the same as the columnnames in df1:

colnames(df2) <- colnames(df1) # or: setNames(df2, colnames(df1))
new.df <- rbind(df1,df2)
new.df <- new.df[order(new.df$k),]

This will result in the following dataframe:

> new.df
   k    l    g
1  A 2004 12.0
11 A 2012 22.0
2  B 2004  3.4
21 B 2012  4.8
3  C 2004  4.5
31 C 2012  5.6


来源:https://stackoverflow.com/questions/35356989/merge-two-data-frames-to-get-alternate-rows-of-each-data-frame-in-sequence

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!