Move NAs to the end of each column in a data frame

后端 未结 4 1610
生来不讨喜
生来不讨喜 2020-12-04 01:30

I have such a data frame:

df <- structure(list(a = c(NA, NA, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L), b = c(NA, NA, NA, 1L, 2L, 3L, 4L, 5L, 6L, 7L), d = c(NA, NA,          


        
相关标签:
4条回答
  • 2020-12-04 02:15

    For fun, you can also make use of length<- and na.omit.

    Here's what that combination would do:

    x <- c(NA, 1, 2, 3)
    x
    # [1] NA  1  2  3
    `length<-`(na.omit(x), length(x))
    # [1]  1  2  3 NA
    

    Applied to your problem, the solution would be:

    df[] <- lapply(df, function(x) `length<-`(na.omit(x), nrow(df)))
    df
    #     a   b  d
    # 1   1  57  5
    # 2   5   2  7
    # 3  34   7  2
    # 4   7   9  8
    # 5   3   5  2
    # 6   5  12  5
    # 7   8 100 NA
    # 8   4  NA NA
    # 9  NA  NA NA
    # 10 NA  NA NA
    
    0 讨论(0)
  • 2020-12-04 02:17

    If you got small number of columns, I suggest:

    data.frame( a=sort(example$a, na.last=T), b=sort(example$b, na.last=T), d=sort(example$d, na.last=T))
    

    Best, Adii_

    0 讨论(0)
  • 2020-12-04 02:20

    After completely misunderstanding the question, here is my final answer:

    # named after beetroot for being the first to ever need this functionality
    beetroot <- function(x) {
        # count NA
        num.na <- sum(is.na(x))
        # remove NA
        x <- x[!is.na(x)]
        # glue the number of NAs at the end
        x <- c(x, rep(NA, num.na))
        return(x)
    }
    
    # apply beetroot over each column in the dataframe
    as.data.frame(lapply(df, beetroot))
    

    It will count the NAs, remove the NAs, and glue NAs at the bottom for each column in the data frame.

    0 讨论(0)
  • 2020-12-04 02:23

    Another solution using lapply (without sorting/reordering the data- per your comments)

    df[] <- lapply(df, function(x) c(x[!is.na(x)], x[is.na(x)]))
    df
    #     a   b  d
    # 1   1  57  5
    # 2   5   2  7
    # 3  34   7  2
    # 4   7   9  8
    # 5   3   5  2
    # 6   5  12  5
    # 7   8 100 NA
    # 8   4  NA NA
    # 9  NA  NA NA
    # 10 NA  NA NA
    

    Or using data.table in order to update df by reference, rather than creating a copy of it (that solution won't sort your data neither)

    library(data.table)
    setDT(df)[, names(df) := lapply(.SD, function(x) c(x[!is.na(x)], x[is.na(x)]))]
    df
    #      a   b  d
    #  1:  1  57  5
    #  2:  5   2  7
    #  3: 34   7  2
    #  4:  7   9  8
    #  5:  3   5  2
    #  6:  5  12  5
    #  7:  8 100 NA
    #  8:  4  NA NA
    #  9: NA  NA NA
    # 10: NA  NA NA
    

    Some benchmarks reveal the base solution is the fastest by far:

    library("microbenchmark")
    david <- function() lapply(df, function(x) c(x[!is.na(x)], x[is.na(x)]))
    dt <- setDT(df)
    david.dt <- function() dt[, names(dt) := lapply(.SD, function(x) c(x[!is.na(x)], x[is.na(x)]))]
    
    microbenchmark(as.data.frame(lapply(df, beetroot)), david(), david.dt())
    # Unit: microseconds
    #                                 expr      min       lq   median        uq      max neval
    #  as.data.frame(lapply(df, beetroot)) 1145.224 1215.253 1274.417 1334.7870 4028.507   100
    #                              david()  116.515  127.382  140.965  149.7185  308.493   100
    #                           david.dt() 3087.335 3247.920 3330.627 3415.1460 6464.447   100
    
    0 讨论(0)
提交回复
热议问题