How to select the last one test without NA in r

问题

My dataframe is similar like this:

Person  W.1   W.2   W.3   W.4   W.5   
1       62    57    52    59    NA
2       49    38    60    NA    NA
3       59    34    NA    NA    NA

Is there a way to select the first and last test without "NA". I have 300 data entries, and W.1 means the first test, W.2 means the second test, W.n means the nth test. I want to compare the score of the first test with the score of the last test. For example, I want to compare:

1    62 59
2    49 60
3    59 34

But different persons have different places having "NA", can someone help me?

Thank you!

回答1:

You can use this solution:

> t(apply(d[-1],1,function(rw) rw[range(which(!is.na(rw)))]))

     [,1] [,2]
[1,]   62   59
[2,]   49   60
[3,]   59   34

where d is your data set.

How it works: for each row of d (rows are scanned using apply(d[-1],1,...), where d[-1] excludes the first column), get the indices of non-NA test results (which(!is.na(rw))), then get the lowest and highest value of indices by using range(), and obtain the test scores that correspond to those indices (rw[...]). The final result is transposed using t().

Note that this solution will work properly even in the case of NAs in the middle of the test scores, e.g. c(NA, 57, NA, 52, NA).

回答2:

Here's a possible vectorized solution using max.col (I'm assuming that the first test is never NA, though it can be easily fixed if otherwise)

indx <- cbind(seq_len(nrow(df)), max.col(!is.na(df), ties.method = "last"))
cbind(df[, 2], df[indx])
#      [,1] [,2]
# [1,]   62   59
# [2,]   49   60
# [3,]   59   34

Another similar solution is to use rowSums

cbind(df[, 2], df[cbind(seq_len(nrow(df)), rowSums(!is.na(df)))])
#      [,1] [,2]
# [1,]   62   59
# [2,]   49   60
# [3,]   59   34

来源：https://stackoverflow.com/questions/28799753/how-to-select-the-last-one-test-without-na-in-r

标签

select