I have a data.frame with several rows which come from a merge which are not completely merged:
b <- read.table(text = \"
ID Age Steatosis
Llopis's request to keep both rows if a given ID has different information for a column complicates matters. First let's create some example data that illustrates the situation:
b <- read.table(text = "ID Age Steatosis Mallory Lille_dico Lille_3 Bili.AHHS2cat
HA-09 16 5 NA
HA-09 16 <33% no/occasional NA 1
HA-10 20 no 2 NA
HA-10 20 yes 0 NA NA",
na.strings = c("NA", ""), header = T)
ID Age Steatosis Mallory Lille_dico Lille_3 Bili.AHHS2cat
1 HA-09 16 NA 5 NA
2 HA-09 16 <33% no/occasional NA NA 1
3 HA-10 20 no NA 2 NA
4 HA-10 20 yes 0 NA NA
This can still be accomplished, but the custom function for summarization (let's call it f) gets a little more complicated:
f <- function(x) {
x <- x[!is.na(x$value),]
if (nrow(x) > 0) {
y <- unique(x[colnames(x) != 'row.ID'])
y$row.ID <- 1:nrow(y)
return(y)
} else {
return(data.frame())
}
}
Notice that this function references a column called "row.ID", which we will create before applying the function:
library(tidyverse) # gives access to dplyr and tidyr packages
b2 <- gather(b, variable, value, -ID, -Age) %>% # gather the many columns into a simplified key/value pair of columns (one called 'variable', the other, 'value') for each ID
group_by(ID, variable) %>% # perform subsequent operations per ID and variable
mutate(row.ID = 1:n()) %>% # add a row identifier
do(f(.)) %>% # apply our custom function
spread(variable, value, convert = T) %>% # un-gather the variable/value columns
ungroup # remove grouping metadata
ID Age row.ID Bili.AHHS2cat Lille_3 Lille_dico Mallory Steatosis
*
1 HA-09 16 1 1 5 NA no/occasional <33%
2 HA-10 20 1 NA 2 0 no
3 HA-10 20 2 NA NA NA yes