Subsetting data by multiple date ranges - R

对着背影说爱祢 提交于 2019-12-11 10:50:29

问题


I'll get straight to the point: I have been given some data sets in .csv format containing regularly logged sensor data from a machine. However, this data set also contains measurements taken when the machine is turned off, which I would like to separate from the data logged from when it is turned on. To subset the relevant data I also have a file containing start and end times of these shutdowns. This file is several hundred rows long.

Examples of the relevant files for this problem:

file: sensor_data.csv

sens_name,time,measurement
sens_A,17/12/11 06:45,32.3321
sens_A,17/12/11 08:01,36.1290
sens_B,17/12/11 05:32,17.1122
sens_B,18/12/11 03:43,12.3189

##################################################

file: shutdowns.csv

shutdown_start,shutdown_end
17/12/11 07:46,17/12/11 08:23
17/12/11 08:23,17/12/11 09:00
17/12/11 09:00,17/12/11 13:30
18/12/11 01:42,18/12/11 07:43

To subset data in R, I have previously used the subset() function with simple conditions which has worked fine, but I don't know how to go about subsetting sensor data which fall outside multiple shutdown date ranges. I've already formatted the date and time data using as.POSIXlt().

I'm suspecting some scripting may be involved to come up with a good solution, but I'm afraid I am not yet experienced enough to handle this type of data.

Any help, advice, or solutions will be greatly appreciated. Let me know if there's anything else needed for a solution.


回答1:


I prefer POSIXct format for ranges within data frames. We create an index for sensors operating during shutdowns with t < shutdown_start OR t > shutdown_end. With these ranges we can then subset the data as necessary:

posixct <- function(x) as.POSIXct(x, format="%d/%m/%y %H:%M")

sensor_data$time <- posixct(sensor_data$time)
shutdowns[] <- lapply(shutdowns, posixct)

ind1 <- sapply(sensor_data$time, function(t) {
  sum(t < shutdowns[,1] | t > shutdowns[,2]) == length(sensor_data$time)})

#Measurements taken when shutdown
sensor_data[ind1,]
#   sens_name                time measurement
# 1    sens_A 2011-12-17 06:45:00     32.3321
# 3    sens_B 2011-12-17 05:32:00     17.1122

#Measurements taken when not shutdown
sensor_data[!ind1,]
#   sens_name                time measurement
# 2    sens_A 2011-12-17 08:01:00     36.1290
# 4    sens_B 2011-12-18 03:43:00     12.3189


来源:https://stackoverflow.com/questions/36357101/subsetting-data-by-multiple-date-ranges-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!