Move NAs to the end of each column in a data frame

后端 未结 4 1634
生来不讨喜
生来不讨喜 2020-12-04 01:30

I have such a data frame:

df <- structure(list(a = c(NA, NA, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L), b = c(NA, NA, NA, 1L, 2L, 3L, 4L, 5L, 6L, 7L), d = c(NA, NA,          


        
4条回答
  •  夕颜
    夕颜 (楼主)
    2020-12-04 02:23

    Another solution using lapply (without sorting/reordering the data- per your comments)

    df[] <- lapply(df, function(x) c(x[!is.na(x)], x[is.na(x)]))
    df
    #     a   b  d
    # 1   1  57  5
    # 2   5   2  7
    # 3  34   7  2
    # 4   7   9  8
    # 5   3   5  2
    # 6   5  12  5
    # 7   8 100 NA
    # 8   4  NA NA
    # 9  NA  NA NA
    # 10 NA  NA NA
    

    Or using data.table in order to update df by reference, rather than creating a copy of it (that solution won't sort your data neither)

    library(data.table)
    setDT(df)[, names(df) := lapply(.SD, function(x) c(x[!is.na(x)], x[is.na(x)]))]
    df
    #      a   b  d
    #  1:  1  57  5
    #  2:  5   2  7
    #  3: 34   7  2
    #  4:  7   9  8
    #  5:  3   5  2
    #  6:  5  12  5
    #  7:  8 100 NA
    #  8:  4  NA NA
    #  9: NA  NA NA
    # 10: NA  NA NA
    

    Some benchmarks reveal the base solution is the fastest by far:

    library("microbenchmark")
    david <- function() lapply(df, function(x) c(x[!is.na(x)], x[is.na(x)]))
    dt <- setDT(df)
    david.dt <- function() dt[, names(dt) := lapply(.SD, function(x) c(x[!is.na(x)], x[is.na(x)]))]
    
    microbenchmark(as.data.frame(lapply(df, beetroot)), david(), david.dt())
    # Unit: microseconds
    #                                 expr      min       lq   median        uq      max neval
    #  as.data.frame(lapply(df, beetroot)) 1145.224 1215.253 1274.417 1334.7870 4028.507   100
    #                              david()  116.515  127.382  140.965  149.7185  308.493   100
    #                           david.dt() 3087.335 3247.920 3330.627 3415.1460 6464.447   100
    

提交回复
热议问题