Merge rows in a dataframe where the rows are disjoint and contain NAs

后端 未结 2 1376
梦毁少年i
梦毁少年i 2020-12-01 16:20

I have a dataframe that has two rows:

| code | name  | v1 | v2 | v3 | v4 |
|------|-------|----|----|----|----|
| 345  | Yemen | NA | 2  | 3  | NA |
| 346  |         


        
2条回答
  •  悲&欢浪女
    2020-12-01 17:11

    Adding dplyr & data.table solutions for completeness

    Using dplyr::coalesce()

    library(dplyr)
    
    sum_NA <- function(x) {if (all(is.na(x))) x[NA_integer_] else sum(x, na.rm = TRUE)}
    
    df %>% 
      group_by(name) %>% 
      summarise_all(sum_NA)
    #> # A tibble: 1 x 6
    #>   name   code    v1    v2    v3    v4
    #>        
    #> 1 Yemen   691     4     2     3     5
    
    # Ref: https://stackoverflow.com/a/45515491
    # Supply lists by splicing them into dots:
    coalesce_by_column <- function(df) {
      return(dplyr::coalesce(!!! as.list(df)))
    }
    
    df %>% 
      group_by(name) %>% 
      summarise_all(coalesce_by_column)
    #> # A tibble: 1 x 6
    #>   name   code    v1    v2    v3    v4
    #>        
    #> 1 Yemen   345     4     2     3     5
    

    Using data.table

    # Ref: https://stackoverflow.com/q/28036294/
    library(data.table)
    setDT(df)[, lapply(.SD, na.omit), by = name]
    #>     name code v1 v2 v3 v4
    #> 1: Yemen  345  4  2  3  5
    #> 2: Yemen  346  4  2  3  5
    
    setDT(df)[, code := NULL][, lapply(.SD, na.omit), by = name]    
    #>     name v1 v2 v3 v4
    #> 1: Yemen  4  2  3  5
    
    setDT(df)[, code := NULL][, lapply(.SD, sum_NA), by = name]
    #>     name v1 v2 v3 v4
    #> 1: Yemen  4  2  3  5
    

提交回复
热议问题