Column name of last non-NA row per row; using tidyverse solution?

假装没事ソ 提交于 2019-12-02 02:42:13

Write a function that solves the problem, following James' suggestion but a little more robust (handles the case when all answers are NA)

f0 = function(df) {
    idx = ifelse(is.na(df), 0L, col(df))
    apply(idx, 1, max)
}

The L makes the 0 an integer, rather than numeric. For a speed improvement (when there are many rows), use the matrixStats package

f1 = function(df) {
    idx = ifelse(is.na(df), 0L, col(df))
    matrixStats::rowMaxs(idx, na.rm=TRUE)
}

Follow markus' suggestion to use this in a dplyr context

mutate(df, lastqnum = f1(df), lastq = c(NA, names(df))[lastqnum + 1])
df %>% mutate(lastqnum = f1(.), lastq = c(NA, names(.))[lastqnum + 1])

or just do it

lastqnum = f1(df)
cbind(df, lastq=c(NA, names(df))[lastqnum + 1], lastqnum)

Edited after acceptance I guess the tidy approach would be first to tidy the data into long form

df1 = cbind(gather(df), id = as.vector(row(df)), event = as.vector(col(df)))

and then to group and summarize

group_by(df1, id) %>%
    summarize(lastq = tail(event[!is.na(value)], 1), lastqname = key[lastq])

This doesn't handle the case when here are no answers.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!