可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
My dataframe is similar like this:
Person W.1 W.2 W.3 W.4 W.5 1 62 57 52 59 NA 2 49 38 60 NA NA 3 59 34 NA NA NA
Is there a way to select the first and last test without "NA". I have 300 data entries, and W.1 means the first test, W.2 means the second test, W.n means the nth test. I want to compare the score of the first test with the score of the last test. For example, I want to compare:
1 62 59 2 49 60 3 59 34
But different persons have different places having "NA", can someone help me?
Thank you!
回答1:
You can use this solution:
> t(apply(d[-1],1,function(rw) rw[range(which(!is.na(rw)))])) [,1] [,2] [1,] 62 59 [2,] 49 60 [3,] 59 34
where d
is your data set.
How it works: for each row of d
(rows are scanned using apply(d[-1],1,...)
, where d[-1]
excludes the first column), get the indices of non-NA test results (which(!is.na(rw))
), then get the lowest and highest value of indices by using range()
, and obtain the test scores that correspond to those indices (rw[...]
). The final result is transposed using t()
.
Note that this solution will work properly even in the case of NAs in the middle of the test scores, e.g. c(NA, 57, NA, 52, NA)
.
回答2:
Here's a possible vectorized solution using max.col
(I'm assuming that the first test is never NA
, though it can be easily fixed if otherwise)
indx <- cbind(seq_len(nrow(df)), max.col(!is.na(df), ties.method = "last")) cbind(df[, 2], df[indx]) # [,1] [,2] # [1,] 62 59 # [2,] 49 60 # [3,] 59 34
Another similar solution is to use rowSums
cbind(df[, 2], df[cbind(seq_len(nrow(df)), rowSums(!is.na(df)))]) # [,1] [,2] # [1,] 62 59 # [2,] 49 60 # [3,] 59 34