I have a data.table of factor columns, and I want to pull out the label of the last non-missing value in each row. It\'s kindof a typical max.col
situation, bu
We convert the 'data.frame' to 'data.table' and create a row id column (setDT(df1, keep.rownames=TRUE)
). We reshape the 'wide' to 'long' format with melt
. Grouped by 'rn', if
there is no NA
element in 'value' column, we get the last element of 'value' (value[.N]
) or else
, we get the element before the first NA in the 'value' to get the 'V1' column, which we extract ($V1
).
melt(setDT(df1, keep.rownames=TRUE), id.var='rn')[,
if(!any(is.na(value))) value[.N]
else value[which(is.na(value))[1]-1], by = rn]$V1
#[1] "u" "q" "w" "h" "r" "t" "e" "t"
In case, the data is already a data.table
dat[, rn := 1:.N]#create the 'rn' column
melt(dat, id.var='rn')[, #melt from wide to long format
if(!any(is.na(value))) value[.N]
else value[which(is.na(value))[1]-1], by = rn]$V1
#[1] "u" "q" "w" "h" "r" "t" "e" "t"
Here is another option
dat[, colInd := sum(!is.na(.SD)), by=1:nrow(dat)][
, as.character(.SD[[.BY[[1]]]]), by=colInd]
Or as @Frank mentioned in the comments, we can use na.rm=TRUE
from melt
and make it more compact
melt(dat[, r := .I], id="r", na.rm=TRUE)[, value[.N], by=r]