Combine column to remove NA's

后端 未结 10 1628
野的像风
野的像风 2020-11-28 06:44

I have some columns in R and for each row there will only ever be a value in one of them, the rest will be NA\'s. I want to combine these into one column with the non-NA val

相关标签:
10条回答
  • 2020-11-28 07:01

    You can use unlist to turn the columns into one vector. Afterwards, na.omit can be used to remove the NAs.

    cbind(data[1], mycol = na.omit(unlist(data[-1])))
    
       a mycol
    x1 A     1
    x2 B     2
    y3 C     3
    z4 D     4
    z5 E     5
    
    0 讨论(0)
  • 2020-11-28 07:01

    Something like this ?

    data.frame(a=data$a, mycol=apply(data[,-1],1,sum,na.rm=TRUE))
    

    gives :

      a mycol
    1 A     1
    2 B     2
    3 C     3
    4 D     4
    5 E     5
    
    0 讨论(0)
  • 2020-11-28 07:03

    In a related link (suppress NAs in paste()) I present a version of paste with a na.rm option (with the unfortunate name of paste5).

    With this the code becomes

    cols <- c("x", "y", "z")
    cbind.data.frame(a = data$a, mycol = paste2(data[, cols], na.rm = TRUE))
    

    The output of paste5 is a character, which works if you have character data otherwise you'll need to coerce to the type you want.

    0 讨论(0)
  • 2020-11-28 07:05

    Though this is not the OP case, it seems some people like the approach based on sums, how about thinking in mean and mode, to make the answer more universal. This answer matches the title, which is what many people will find.

    data <- data.frame('a' = c('A','B','C','D','E'),
                       'x' = c(1,2,NA,NA,9),
                       'y' = c(NA,6,3,NA,5),
                       'z' = c(NA,NA,NA,4,5))
    
    splitdf<-split(data[,c(2:4)], seq(nrow(data[,c(2:4)])))
    
    data$mean<-unlist(lapply(splitdf, function(x)  mean(unlist(x), na.rm=T) ) )
    data$mode<-unlist(lapply(splitdf, function(x)  {
      tab <- tabulate(match(x, na.omit(unique(unlist(x) )))); 
                      paste(na.omit(unique(unlist(x) ))[tab == max(tab) ], collapse = ", " )}) )
    
    data
      a  x  y  z     mean mode
    1 A  1 NA NA 1.000000    1
    2 B  2  6 NA 4.000000 2, 6
    3 C NA  3 NA 3.000000    3
    4 D NA NA  4 4.000000    4
    5 E  9  5  5 6.333333    5
    
    0 讨论(0)
  • 2020-11-28 07:06

    A dplyr::coalesce based solution could be as:

    data %>% mutate(mycol = coalesce(x,y,z)) %>%
             select(a, mycol)
    #   a mycol
    # 1 A     1
    # 2 B     2
    # 3 C     3
    # 4 D     4
    # 5 E     5 
    

    Data

    data <- data.frame('a' = c('A','B','C','D','E'),
                     'x' = c(1,2,NA,NA,NA),
                     'y' = c(NA,NA,3,NA,NA),
                     'z' = c(NA,NA,NA,4,5))
    
    0 讨论(0)
  • 2020-11-28 07:08

    Here's a more general (but even simpler) solution which extends to all column types (factors, characters etc.) with non-ordered NA's. The strategy is simply to merge the non-NA values of other columns into your merged column using is.na for indexing:

    data$m = data$x  # your new merged column start with x
    data$m[!is.na(data$y)] = data$y[!is.na(data$y)]  # merge with y
    data$m[!is.na(data$z)] = data$z[!is.na(data$z)]  # merge with z
    
    > data
      a  x  y  z m
    1 A  1 NA NA 1
    2 B  2 NA NA 2
    3 C NA  3 NA 3
    4 D NA NA  4 4
    5 E NA NA  5 5
    

    Note that this will overwrite existing values in m if there are several non-NA values in the same row. If you have a lot of columns you could automate this by looping over colnames(data).

    0 讨论(0)
提交回复
热议问题