merge/combine columns with same name but incomplete data

后端未结

关注

 7  972

臣服心动 2020-12-14 17:57

I have two data frames that have some columns with the same names and others with different names. The data frames look something like this:

df1
      ID hel


      
      
        
          7条回答        

        
                    
            
            
                         
                
              
              
                
                   谎友^
                                             
                
                
                (楼主)
            
              
              
                2020-12-14 18:24
              

            
            
                        
Using tidyverse we could use coalesce.

None of the solutions below builds extra rows, data stays more or less of the same size and similar shape throughout the chain.

Solution 1

list(df1,df2) %>%
  transpose(union(names(df1),names(df2))) %>%
  map_dfc(. %>% compact %>% invoke(coalesce,.))

# # A tibble: 5 x 7
#      ID hello world football baseball hockey soccer
#                 
# 1     1     2     3       43        6      7      4
# 2     2     5     1       24       32      2      5
# 3     3    10     8        2       23      8     23
# 4     4     4    17        5       15      5     12
# 5     5     9     7       12       23      3     43


Explanations


Wrap both data frames into a list
transpose it, so each new item at the root has the name of a column of the output. Default behavior of transpose is to take the first argument as a template so unfortunately we have to be explicit to get all of them.
compact these items, as they were all of length 2, but with one of them being NULL when the given column was missing on one side.
coalesce those, which basically means return the first non NA you find, when putting arguments side by side.


if repeating df1 and df2 on the second line is an issue, use the following instead:

transpose(invoke(union, setNames(map(., names), c("x","y"))))


Solution 2

Same philosophy, but this time we loop on names:

map_dfc(set_names(union(names(df1), names(df2))),
        ~ invoke(coalesce, compact(list(df1[[.x]], df2[[.x]]))))

# # A tibble: 5 x 7
#      ID hello world football baseball hockey soccer
#                 
# 1     1     2     3       43        6      7      4
# 2     2     5     1       24       32      2      5
# 3     3    10     8        2       23      8     23
# 4     4     4    17        5       15      5     12
# 5     5     9     7       12       23      3     43


Here it is once pipified for those who may prefer :

union(names(df1), names(df2)) %>%
  set_names %>%
  map_dfc(~ list(df1[[.x]], df2[[.x]]) %>%
            compact %>%
            invoke(coalesce, .))


Explanations


set_names gives to character vector names identical to its values, so map_dfc can name the output's columns right.
df1[[.x]] will return NULL when .x is not a column of df1, we take advantage of this.
df1 and df2 are mentioned 2 times each and I can't think of any way around it.


Solution 1 is cleaner in respect to these points so I recommend it.
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它7个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复