Merging rows with shared information

前端 未结 4 1208
故里飘歌
故里飘歌 2021-01-07 04:18

I have a data.frame with several rows which come from a merge which are not completely merged:

b <- read.table(text = \"
      ID   Age    Steatosis               


        
4条回答
  •  梦谈多话
    2021-01-07 04:46

    Llopis's request to keep both rows if a given ID has different information for a column complicates matters. First let's create some example data that illustrates the situation:

    b <- read.table(text = "ID   Age    Steatosis       Mallory Lille_dico Lille_3 Bili.AHHS2cat
                    HA-09   16                           5             NA
                    HA-09   16   <33% no/occasional             NA             1
                    HA-10   20   no   2 NA
                    HA-10   20   yes  0 NA NA",
                    na.strings = c("NA", ""), header = T)
    
         ID Age Steatosis       Mallory Lille_dico Lille_3 Bili.AHHS2cat
    1 HA-09  16                         NA       5            NA
    2 HA-09  16      <33% no/occasional         NA      NA             1
    3 HA-10  20        no                   NA       2            NA
    4 HA-10  20       yes                    0      NA            NA
    

    This can still be accomplished, but the custom function for summarization (let's call it f) gets a little more complicated:

    f <- function(x) {
        x <- x[!is.na(x$value),]
        if (nrow(x) > 0) {
            y <- unique(x[colnames(x) != 'row.ID'])
            y$row.ID <- 1:nrow(y)
            return(y)
        } else {
            return(data.frame())
        }
    }
    

    Notice that this function references a column called "row.ID", which we will create before applying the function:

    library(tidyverse) # gives access to dplyr and tidyr packages
    
    b2 <- gather(b, variable, value, -ID, -Age) %>% # gather the many columns into a simplified key/value pair of columns (one called 'variable', the other, 'value') for each ID
        group_by(ID, variable) %>% # perform subsequent operations per ID and variable
        mutate(row.ID = 1:n()) %>% # add a row identifier
        do(f(.)) %>% # apply our custom function
        spread(variable, value, convert = T) %>% # un-gather the variable/value columns
        ungroup # remove grouping metadata
    
          ID   Age row.ID Bili.AHHS2cat Lille_3 Lille_dico       Mallory Steatosis
    *                                    
    1  HA-09    16      1             1       5         NA no/occasional      <33%
    2  HA-10    20      1            NA       2          0                  no
    3  HA-10    20      2            NA      NA         NA                 yes
    

提交回复
热议问题