R: Selecting first of n consecutive rows above a certain threshold value

后端未结

关注

 4  1189

迷失自我 2020-12-10 20:34

I have a data frame with MRN, dates, and a test value.

I need to select all the first rows per MRN that have three

4条回答

佛祖请我去吃肉 (楼主)

2020-12-10 21:12

Here's a ddply solution (sorry, I'm not up-to-date with the %>% syntax, but perhaps it could also be applied).

I'm unsure if it's "elegant" in the sense that you mean, but it will make sense upon reading it a second time (which to me is more important than a one-liner), and is robust to missing dates etc.

The key is to use rle (run length encoding) to look for 'runs' of ANC >= 0.5 where the run is at least length 3. This takes care of the 'consecutive' part. we save this into r.

Then r.i gives the index in the first run that is of length 3 or more, and where the value of the run is TRUE.

To get the index in x you just sum the run lengths up to but not including the run we are interested in, and add 1 to get to the start (that's the sum(r$lengths[1:(r.i - 1)]) and the +1).

ddply(df, .(MRN), function (x) { r <- rle(x$ANC >= 0.5) # find 'runs' of x$ANC >= 0.5 # find index of first run of length >=3 with ANC >= .5 r.i <- which(r$lengths >= 3 & r$values)[1] if (!is.na(r.i)) { # get index of first row in that run and return it. return(x[sum(r$lengths[seq_len(r.i - 1)]) + 1, ]) } return(NULL) })

It will make better sense if you extract e.g. x <- subset(df, MRN == '001') and step through to see what r, r.i look like.

0 讨论(0)

查看其它4个回答

发布评论:

提交评论

加载中...

验证码

看不清?

提交回复