Extract last non-missing value in row with data.table

前端 未结 5 882
囚心锁ツ
囚心锁ツ 2021-01-04 04:03

I have a data.table of factor columns, and I want to pull out the label of the last non-missing value in each row. It\'s kindof a typical max.col situation, bu

5条回答
  •  甜味超标
    2021-01-04 04:38

    We convert the 'data.frame' to 'data.table' and create a row id column (setDT(df1, keep.rownames=TRUE)). We reshape the 'wide' to 'long' format with melt. Grouped by 'rn', if there is no NA element in 'value' column, we get the last element of 'value' (value[.N]) or else, we get the element before the first NA in the 'value' to get the 'V1' column, which we extract ($V1).

    melt(setDT(df1, keep.rownames=TRUE), id.var='rn')[,
         if(!any(is.na(value))) value[.N] 
         else value[which(is.na(value))[1]-1], by =  rn]$V1
    #[1] "u" "q" "w" "h" "r" "t" "e" "t"
    

    In case, the data is already a data.table

    dat[, rn := 1:.N]#create the 'rn' column
    melt(dat, id.var='rn')[, #melt from wide to long format
         if(!any(is.na(value))) value[.N] 
         else value[which(is.na(value))[1]-1], by =  rn]$V1
    #[1] "u" "q" "w" "h" "r" "t" "e" "t"
    

    Here is another option

    dat[, colInd := sum(!is.na(.SD)), by=1:nrow(dat)][
       , as.character(.SD[[.BY[[1]]]]), by=colInd]
    

    Or as @Frank mentioned in the comments, we can use na.rm=TRUE from melt and make it more compact

     melt(dat[, r := .I], id="r", na.rm=TRUE)[, value[.N], by=r]
    

提交回复
热议问题