How to combine two data frames using dplyr or other packages?

拈花ヽ惹草 提交于 2021-02-09 15:45:48

问题


I have two data frames:

df1 = data.frame(index=c(0,3,4),n1=c(1,2,3))  
df1
#  index n1
# 1    0  1
# 2    3  2
# 3    4  3

df2 = data.frame(index=c(1,2,3),n2=c(4,5,6))  
df2
#   index n2
# 1     1  4
# 2     2  5
# 3     3  6

I want to join these to:

  index n
1     0 1
2     1 4
3     2 5
4     3 8  (index 3 in two df, so add 2 and 6 in each df)
5     4 3
6     5 0  (index 5 not exists in either df, so set 0)
7     6 0  (index 6 not exists in either df, so set 0)

The given data frames are just part of large dataset. Can I do it using dplyr or other packages in R?


回答1:


Using data.table (would be efficient for bigger datasets). I am not changing the column names, as the rbindlist uses the name of the first dataset ie. in this case n from the second column (Don't know if it is a feature or bug). Once you join the datasets by rbindlist, group it by column index i.e. (by=index) and do the sum of n column (list(n=sum(n)) )

 library(data.table)
 rbindlist(list(data.frame(index=0:6,n=0), df1,df2))[,list(n=sum(n)), by=index]
     index n
 #1:     0 1
 #2:     1 4
 #3:     2 5
 #4:     3 8
 #5:     4 3
 #6:     5 0
 #7:     6 0

Or using dplyr. Here, the column names of all the datasets should be the same. So, I am changing it before binding the datasets using rbind_list. If the names are different, there will be multiple columns for each name. After joining the datasets, group it by index and then use summarize and do the sum of column n.

 library(dplyr)
 nm1 <- c("index", "n") 
 colnames(df1) <- colnames(df2) <- nm1 
 rbind_list(df1,df2, data.frame(index=0:6, n=0)) %>% 
                                          group_by(index) %>%
                                          summarise(n=sum(n))



回答2:


This is something you could do with the base functions aggregate and rbind

df1 = data.frame(index=c(0,3,4),n=c(1,2,3))  
df2 = data.frame(index=c(1,2,3),n=c(4,5,6))  
aggregate(n~index, rbind(df1, df2, data.frame(index=0:6, n=0)), sum)

which returns

  index n
1     0 1
2     1 4
3     2 5
4     3 8
5     4 3
6     5 0
7     6 0



回答3:


How about

names(df1) <- c("index", "n")                # set colnames of df1 to target
df3 <- rbind(df1,setNames(df2, names(df1)))  # set colnnames of df2 and join
df <- df3 %>% dplyr::arrange(index)          # sort by index

Cheers.



来源:https://stackoverflow.com/questions/26520644/how-to-combine-two-data-frames-using-dplyr-or-other-packages

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!