filtering observations from time series conditionally by group

对着背影说爱祢 提交于 2019-12-13 06:39:44

问题


I have a df (“df”) containing multiple time series (value ~ time) whose observations are grouped by 3 factors: temp, rep, and species. These data need to be trimmed at the lower and upper ends of the time series, but these threshold values are group conditional (e.g. remove observations below 2 and above 10 where temp=10, rep=2, and species = “A”). I have an accompanying df (df_thresholds) that contains grouping values and the mins and maxs i want to use for each group. Not all groups need trimming (I would like to update this file regularly which would guide where to trim df). Can anybody help me conditionally filter out these values by group? I have the following, which is close but not quite there. When I reverse the max and min boolean tests, I get zero observations.

df <- data.frame(species = c(rep("A", 16), rep("B", 16)),
                 temp=as.factor(c(rep(10,4),rep(20,4),rep(10,4),rep(20,4))),
                 rep=as.factor(c(rep(1,8),rep(2,8),rep(1,8),rep(2,8))),
                 time=rep(seq(1:4),4),
                 value=c(1,4,8,16,2,4,9,16,2,4,10,16,2,4,15,16,2,4,6,16,1,4,8,16,1,2,8,16,2,3,4,16))

df_thresholds <- data.frame(species=c("A", "A", "B"), 
                            temp=as.factor(c(10,20,10)),
                            rep=as.factor(c(1,1,2)), 
                            min_value=c(2,4,2),
                            max_value=c(10,10,9))

#desired outcome
df_desired <- df[c(2:3,6:7,9:24,26:27,29:nrow(df)),]


#attempt
df2 <- df

for (i in 1:nrow(df_thresholds)) {  
  df2 <- df2 %>%
    filter(!(species==df_thresholds$species[i] & temp==df_thresholds$temp[i] & rep==df_thresholds$rep[i] & value>df_thresholds$min_value[i] & value<df_thresholds$max_value[i]))
}

EDIT: Here's the solution I implemented per suggestions below.

df_test <- left_join(df, df_thresholds, by=c('species','temp','rep'))
df_test$min_value[is.na(df_test$min_value)] <- 0
df_test$max_value[is.na(df_test$max_value)] <- 999

df_test2 <- df_test %>%
  filter(value >= min_value & value <= max_value)

回答1:


We can find out indices which we want to exclude using mapply

df[-c(with(df_thresholds, 
      mapply(function(x, y, z, min_x, max_x) 
           which(df$species == x & df$temp == y & df$rep == z & 
              (df$value < min_x | df$value > max_x)),
                 species, temp, rep, min_value, max_value))), ]


#   species temp rep time value
#2        A   10   1    2     4
#3        A   10   1    3     8
#6        A   20   1    2     4
#7        A   20   1    3     9
#9        A   10   2    1     2
#10       A   10   2    2     4
#11       A   10   2    3    10
#12       A   10   2    4    16
#......

In mapply we pass all the columns of df_thresholds filter df accordingly and find out indices which are outside min and max value for each row and exclude them from the original dataframe.

The result of mapply call is

#[1]  1  4  5  8 25 28

which are the rows we want to exclude from the df since they fall out of range.



来源:https://stackoverflow.com/questions/53716442/filtering-observations-from-time-series-conditionally-by-group

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!