Comparing values longitudinally in R… with a twist

两盒软妹~` 提交于 2019-12-12 18:05:44

问题


I have the results of a test taken by a number of individuals at as many as four time periods. Here's a sample:

dat <- structure(list(Participant_ID = c("A", "A", "A", "A", "B", "B", 
"B", "B", "C", "C", "C", "C"), phase = structure(c(1L, 2L, 3L, 
4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L), .Label = c("base", "sixmos", 
"twelvemos", "eighteenmos"), class = "factor"), result = c("Negative", 
"Negative", "Negative", "Negative", "Negative", "Positive", "Negative", 
NA, "Positive", "Indeterminate", "Negative", "Negative")), .Names = c("Participant_ID", 
"phase", "result"), row.names = c(1L, 2L, 3L, 4L, 97L, 98L, 99L, 
100L, 9L, 10L, 11L, 12L), class = c("cast_df", "data.frame"))

which looks like:

    Participant_ID       phase        result
1                A        base      Negative
2                A      sixmos      Negative
3                A   twelvemos      Negative
4                A eighteenmos      Negative
97               B        base      Negative
98               B      sixmos      Positive
99               B   twelvemos      Negative
100              B eighteenmos          <NA>
9                C        base      Positive
10               C      sixmos Indeterminate
11               C   twelvemos      Negative
12               C eighteenmos      Negative

I'd like to add an identifier to each test to note whether that test was a conversion from the previous status (negative to positive), a reversion (positive to negative), or stable. The catch is that I'm not just comparing the base test to the six months test, six months to twelve months, etc. - in cases like C, the sixmos test should be marked as stable or inconclusive (the exact term for that is ambiguous), and (more importantly) the twelvemos test should then be compared to the base test and marked as a reversion. Conversely, if someone had a sequence of "Negative", "Indeterminate", "Negative", that should be stable.

It's the latter part that I'm stuck on; if it were just a sequence of comparisons per participant, I'd be all right, but I'm having trouble thinking about how to elegantly deal with these variable comparison pairs. Your help is, as always, much appreciated.


回答1:


I don't think you outlined what should happen in all possible cases (e.g. what is the status when the sequence is "Indeterminate, Indeterminate"?) but here is an idea: treat the "indeterminate" cases as missing and "impute" them using the na.locf from package zoo to carry forward the values. (Or better, reimplement it to address your case.)

library(plyr)
at <- at[with(at, order(Participant_ID, phase)),]
at <- ddply(at, "Participant_ID", function(x) {
    ## have to figure out what to do with missing data
    result.fix <- na.locf(car::recode(x$result, "'Negative'=0; 'Positive'=1;'Indeterminate'=NA;NA=1000"))
    x$status <- NA
    x$status[-1] <- result.fix[-1]-result.fix[-length(result.fix)]
    x$status <- car::recode(x$status, "-1='reversion'; 1='conversion'; 0='stable'; else=NA")
    x$status[x$result=="Indeterminate"] <- "stable or inconclusive"
    x
})

Not sure this qualifies as elegant, though!



来源:https://stackoverflow.com/questions/2789916/comparing-values-longitudinally-in-r-with-a-twist

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!