Return last data frame column which is not NA

妖精的绣舞 提交于 2020-08-07 05:53:53

问题


I have a dataset consisting of multiple cases that are stamped either 1 OR NA. I'm trying to figure out a way to return the highest numbered stamp that is not NA for each case.

Here are some sample data:

PIN <- c("case1", "case2", "case3", "case4", "case5")
STAMP_1 <- c(1, 1, 1, 1, 1)
STAMP_2 <- c(NA, 1, 1, NA, 1)
STAMP_3 <- c(1, NA, 1, 1, NA)
STAMP_4 <- c(NA, NA, 1, 1, NA)
STAMP_5 <- c(1, NA, NA, 1, NA)
data <- data.frame(PIN, STAMP_1, STAMP_2, STAMP_3, STAMP_4, STAMP_5)

I'd like to figure out a way to return a data frame that will have columns: "case1", "case2", "case3", "case4", "case5" AND "STAMP_5", "STAMP_2", "STAMP_4", "STAMP_5", "STAMP_2" in this case.


回答1:


Here is a method with max.col, is.na and names. max.col finds the column with the maximum value for each row. Here, we feed it the value of is.na, which is TRUE or FALSE and use ties.method="last" to take the final non-NA value. This position is used to index names(dat).

data.frame(PIN=dat$PIN,
           stamp=names(dat)[-1][max.col(!is.na(dat[-1]), ties.method="last")])
    PIN   stamp
1 case1 STAMP_5
2 case2 STAMP_2
3 case3 STAMP_4
4 case4 STAMP_5
5 case5 STAMP_2

In the case that you have an entire row with NAs, max.col will return the final position of the row (a silent failure?). One way to return an NA rather than that position is to use a trick with NA and exponentiation. Here, we apply through the rows and find any NA rows with any rows that have at least one non-NA value return FALSE (or 0).

data.frame(PIN=dat$PIN,
           stamp=names(dat)[-1][
                max.col(!is.na(dat[-1]), ties.method="last") * NA^!rowSums(!is.na(dat[-1]))])

I switched from applyapply(dat[-1], 1, function(x) all(is.na(x))) to !rowSums(!is.na(dat[-1])) after Frank's suggestion. This should be quite a bit faster than apply.




回答2:


By using dplyr with melt (from reshape)

dat=melt(dat)
dat=na.omit(dat)
dat%>%group_by(PIN)%>%slice(n())

# A tibble: 5 x 3
# Groups:   PIN [5]
     PIN variable value
  <fctr>   <fctr> <dbl>
1  case1  STAMP_5     1
2  case2  STAMP_2     1
3  case3  STAMP_4     1
4  case4  STAMP_5     1
5  case5  STAMP_2     1



回答3:


Base R

temp = cbind(NA, data[-1])
temp = temp * col(temp)
data.frame(PIN = data$PIN,
           STAMP = names(temp)[max.col(m = replace(temp, is.na(temp), 0),
                                       ties.method = "first")])
#    PIN   STAMP
#1 case1 STAMP_5
#2 case2 STAMP_2
#3 case3 STAMP_4
#4 case4 STAMP_5
#5 case5 STAMP_2


来源:https://stackoverflow.com/questions/45947603/return-last-data-frame-column-which-is-not-na

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!