merge/combine columns with same name but incomplete data

后端 未结 7 954
臣服心动
臣服心动 2020-12-14 17:57

I have two data frames that have some columns with the same names and others with different names. The data frames look something like this:

df1
      ID hel         


        
相关标签:
7条回答
  • 2020-12-14 18:38

    Here is a more tidyr centric approach that does something similar to the currently accepted answer. The approach is simply to stack the data frames on top of each other with bind_rows (which matches column names), gather up all the non ID columns with na.rm = TRUE, and then spread them back out. This should be robust to situations where the condition "if the value is NA in "df1" it would have a value in "df2" (and vice versa)" doesn't always hold, compared to a summarise option.

    library(tidyverse)
    df1 <- structure(list(ID = 1:5, hello = c(NA, NA, 10L, 4L, NA), world = c(NA, NA, 8L, 17L, NA), hockey = c(7L, 2L, 8L, 5L, 3L), soccer = c(4L, 5L, 23L, 12L, 43L)), row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"), spec = structure(list(cols = list(ID = structure(list(), class = c("collector_integer", "collector")), hello = structure(list(), class = c("collector_integer", "collector")), world = structure(list(), class = c("collector_integer", "collector")), hockey = structure(list(), class = c("collector_integer", "collector")), soccer = structure(list(), class = c("collector_integer", "collector"))), default = structure(list(), class = c("collector_guess", "collector"))), class = "col_spec"))
    df2 <- structure(list(ID = 1:5, hello = c(2L, 5L, NA, NA, 9L), world = c(3L, 1L, NA, NA, 7L), football = c(43L, 24L, 2L, 5L, 12L), baseball = c(6L, 32L, 23L, 15L, 2L)), row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"), spec = structure(list(cols = list(ID = structure(list(), class = c("collector_integer", "collector")), hello = structure(list(), class = c("collector_integer", "collector")), world = structure(list(), class = c("collector_integer", "collector")), football = structure(list(), class = c("collector_integer", "collector")), baseball = structure(list(), class = c("collector_integer", "collector"))), default = structure(list(), class = c("collector_guess", "collector"))), class = "col_spec"))
    
    df1 %>%
      bind_rows(df2) %>%
      gather(variable, value, -ID, na.rm = TRUE) %>%
      spread(variable, value)
    #> # A tibble: 5 x 7
    #>      ID baseball football hello hockey soccer world
    #>   <int>    <int>    <int> <int>  <int>  <int> <int>
    #> 1     1        6       43     2      7      4     3
    #> 2     2       32       24     5      2      5     1
    #> 3     3       23        2    10      8     23     8
    #> 4     4       15        5     4      5     12    17
    #> 5     5        2       12     9      3     43     7
    

    Created on 2018-07-13 by the reprex package (v0.2.0).

    0 讨论(0)
提交回复
热议问题