R: Subset a data frame based on times that are within a certain number of minutes of an observation window

梦想与她 提交于 2019-12-12 19:42:05

问题


Let's say I have a data frame with start and end time columns, a measurement column and a time of measurement column, like so:

     start         end    value                time
   9:01:00     9:02:00     30.6  2013-03-25 9:05:00
   9:01:00     9:02:00     30.8  2013-03-25 9:15:00
   9:46:00     9:46:00     28.2  2013-03-25 9:43:00
   9:46:00     9:46:00     28.9  2013-03-25 9:53:00
  10:54:00    10:59:00     13.4 2013-03-25 10:56:00
  10:54:00    10:59:00     13.8 2013-03-25 11:56:00

How might one subset this data frame to include only rows for which the time column is within the start and end time or ten minutes before the start time and ten minutes after the end time. I'm choosing ten minutes arbitrarily, and would like to know how to do this for any amount of time before and after the start and end time.

The resulting data frame would be as follows:

     start         end    value                time
   9:01:00     9:02:00     30.6  2013-03-25 9:05:00
   9:46:00     9:46:00     28.2  2013-03-25 9:43:00
   9:46:00     9:46:00     28.9  2013-03-25 9:53:00
  10:54:00    10:59:00     13.4 2013-03-25 10:56:00

Is there a way to do this other than by subtracting/adding x number of minutes from the start/end column entries, and then subsetting based on whether or not the time column falls between these expanded windows?

Currently, I have convert my time columns into POSIXlt format. Unfortunately, this gives todays date to the times in the start and end column.

here is the dput for the first data frame:

structure(list(start = structure(list(sec = c(0, 0, 0, 0, 0, 
0), min = c(1L, 1L, 46L, 46L, 54L, 54L), hour = c(9L, 9L, 9L, 
9L, 10L, 10L), mday = c(7L, 7L, 7L, 7L, 7L, 7L), mon = c(7L, 
7L, 7L, 7L, 7L, 7L), year = c(113L, 113L, 113L, 113L, 113L, 113L
), wday = c(3L, 3L, 3L, 3L, 3L, 3L), yday = c(218L, 218L, 218L, 
218L, 218L, 218L), isdst = c(1L, 1L, 1L, 1L, 1L, 1L)), .Names = c("sec", 
"min", "hour", "mday", "mon", "year", "wday", "yday", "isdst"
), class = c("POSIXlt", "POSIXt")), end = structure(list(sec = c(0, 
0, 0, 0, 0, 0), min = c(2L, 2L, 46L, 46L, 59L, 59L), hour = c(9L, 
9L, 9L, 9L, 10L, 10L), mday = c(7L, 7L, 7L, 7L, 7L, 7L), mon = c(7L, 
7L, 7L, 7L, 7L, 7L), year = c(113L, 113L, 113L, 113L, 113L, 113L
), wday = c(3L, 3L, 3L, 3L, 3L, 3L), yday = c(218L, 218L, 218L, 
218L, 218L, 218L), isdst = c(1L, 1L, 1L, 1L, 1L, 1L)), .Names = c("sec", 
"min", "hour", "mday", "mon", "year", "wday", "yday", "isdst"
), class = c("POSIXlt", "POSIXt")), value = c(30.6, 30.8, 28.2, 
28.9, 13.4, 13.8), time = structure(list(sec = c(0, 0, 0, 0, 
0, 0), min = c(5L, 15L, 43L, 53L, 56L, 56L), hour = c(9L, 9L, 
9L, 9L, 10L, 11L), mday = c(25L, 25L, 25L, 25L, 25L, 25L), mon = c(2L, 
2L, 2L, 2L, 2L, 2L), year = c(113L, 113L, 113L, 113L, 113L, 113L
), wday = c(1L, 1L, 1L, 1L, 1L, 1L), yday = c(83L, 83L, 83L, 
83L, 83L, 83L), isdst = c(1L, 1L, 1L, 1L, 1L, 1L)), .Names = c("sec", 
"min", "hour", "mday", "mon", "year", "wday", "yday", "isdst"
), class = c("POSIXlt", "POSIXt"))), .Names = c("start", "end", 
"value", "time"), row.names = c(NA, -6L), class = "data.frame")

here is the dput for the second data frame

structure(list(start = structure(list(sec = c(0, 0, 0, 0), min = c(1L, 
46L, 46L, 54L), hour = c(9L, 9L, 9L, 10L), mday = c(7L, 7L, 7L, 
7L), mon = c(7L, 7L, 7L, 7L), year = c(113L, 113L, 113L, 113L
), wday = c(3L, 3L, 3L, 3L), yday = c(218L, 218L, 218L, 218L), 
    isdst = c(1L, 1L, 1L, 1L)), .Names = c("sec", "min", "hour", 
"mday", "mon", "year", "wday", "yday", "isdst"), class = c("POSIXlt", 
"POSIXt")), end = structure(list(sec = c(0, 0, 0, 0), min = c(2L, 
46L, 46L, 59L), hour = c(9L, 9L, 9L, 10L), mday = c(7L, 7L, 7L, 
7L), mon = c(7L, 7L, 7L, 7L), year = c(113L, 113L, 113L, 113L
), wday = c(3L, 3L, 3L, 3L), yday = c(218L, 218L, 218L, 218L), 
    isdst = c(1L, 1L, 1L, 1L)), .Names = c("sec", "min", "hour", 
"mday", "mon", "year", "wday", "yday", "isdst"), class = c("POSIXlt", 
"POSIXt")), value = c(30.6, 28.2, 28.9, 13.4), time = structure(list(
    sec = c(0, 0, 0, 0), min = c(5L, 43L, 53L, 56L), hour = c(9L, 
    9L, 9L, 10L), mday = c(25L, 25L, 25L, 25L), mon = c(2L, 2L, 
    2L, 2L), year = c(113L, 113L, 113L, 113L), wday = c(1L, 1L, 
    1L, 1L), yday = c(83L, 83L, 83L, 83L), isdst = c(1L, 1L, 
    1L, 1L)), .Names = c("sec", "min", "hour", "mday", "mon", 
"year", "wday", "yday", "isdst"), class = c("POSIXlt", "POSIXt"
))), .Names = c("start", "end", "value", "time"), row.names = c(NA, 
-4L), class = "data.frame")

回答1:


Building on @EliGurarie's answer:

#dat <- ....see original question

Convert the times to POSIX representations and do the maths:

datestem <- as.character(as.Date(dat$time))
dat$start <- as.POSIXct(paste(datestem,format(dat$start,"%H:%M:%S")))
dat$end <- as.POSIXct(paste(datestem,format(dat$end,"%H:%M:%S")))

dat[
     with(
      dat,
      difftime(start,time,units="mins") > -10 &
      difftime(end,time,units="mins") < 10
     ),
   ]

Alternatively, use a bit of rounding and some intermediate variables:

min10 <- 10/(60*24)
ds <- difftime(dat$start,dat$time,units="days")
ds <- dd - round(dd) 
de <- difftime(dat$end,dat$time,units="days")
de <- de - round(de) 

dat[ds > -min10 & de < min10,]



回答2:


No fun to recreate, but the answer should be as simple as:

data[with(data, time > start - 10*60 & time < end + 10*60),]

That assumes that that start, end and time objects are all actually comparable (i.e. corresponding year and date) - otherwise just convert the substring that corresponds to the time of day to a POSIX.

UPDATE: Ok, since your dates are off, you need to recreate them to "synchronize", e.g.:

data$start <- as.POSIXct(substr(data$start,12,19), format="%H:%M:%S")
data$end <- as.POSIXct(substr(data$end,12,19), format="%H:%M:%S")
data$time <- as.POSIXct(substr(data$time,12,19), format="%H:%M:%S")

Now, the line above gives what you want. Probably, you should be careful about how you encode the POSIX from your raw data off the bat. Also, for most applications a POSIXct might be preferred to the POSIXlt - where each element is a list. This can gum up (or slow down) some operations later down the line.



来源:https://stackoverflow.com/questions/18092081/r-subset-a-data-frame-based-on-times-that-are-within-a-certain-number-of-minute

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!