Selecting a specific range of days prior to event in R

我只是一个虾纸丫 提交于 2021-02-11 12:47:33

问题


I have been learning r for a few days, and I am trying to do a specific output and I can not figure out how to select days prior to an event.

I am trying to determine how recharge events impact detections of a containment in water samples. I have data that has 7 columns (Date, month, day, year, samp, prcip, snow) where the date is YYYY/MM/DD, month, day, and year are what they say, samp has a 0, 1, or NA, and both precip and snow have daily totals for either rain or snow. There is 1 row for each day.

I want to explore rain events for different number of days prior to a sampling event. I would like to select the rows with either a 0 (sample and no detection) or 1 (sample with detection) and then select a number of days prior, say 5 for this example, and compare the mean, sum, etc of those prior days between the 0 and 1 groups.

I have found ways to select just the rows, and to get a count of how many consecutive days have either a 0 or a 1 (How to subset consecutive rows if they meet a condition), but I can not figure out how to select a number of days prior to a sampling event and create a new table.

I have over 800 sampling days spread over 20+ years, so I do not want to input each date either (Subset dataframe where date is within x days of a vector of dates in R).

I have tried using the %>% pipe, and some other selection methods and have been able to select the rows containing 0 or 1, but I don't understand how to also get the days prior to the sampling day. I am looking for any direction, suggestions, or functions/tools/packages to look up as I have come up short on new avenues to explore.

I would like to select my data so I can eventually run some simple statistics on it. This is an exploratory project for me - to learn r and practice statistics. I would like to run t-tests, and ANOVA, look at different days prior to sampling. I am selecting sample days so I can eventually ask a question such as "how does rainfall 5 days prior to a sampling event impact detections" or "how is the mean of rain 5 days prior to a positive detection different from 5 days before a negative detection". Hopefully giving you context of my purpose helps me to explain myself and what I am looking for.

How my data looks now:

date    month   day year    samp    precip  snow

11/11/1988 11 11 1988 NA 0 0 11/12/1988 11 12 1988 NA 0 0 11/13/1988 11 13 1988 NA 0.55 0 11/14/1988 11 14 1988 NA 0 0 11/15/1988 11 15 1988 NA 0 0 11/16/1988 11 16 1988 NA 0.52 0 11/17/1988 11 17 1988 NA 0 0 11/18/1988 11 18 1988 NA 0 0 11/19/1988 11 19 1988 NA 0 0 11/20/1988 11 20 1988 NA 0.39 0 11/21/1988 11 21 1988 NA 0.43 0 11/22/1988 11 22 1988 NA 0 0 11/23/1988 11 23 1988 NA 0 0 11/24/1988 11 24 1988 NA 0 0 11/25/1988 11 25 1988 NA 0 0 11/26/1988 11 26 1988 NA 0.11 0 11/27/1988 11 27 1988 NA 0.08 0 11/28/1988 11 28 1988 NA 0.01 0 11/29/1988 11 29 1988 NA 0 0 11/30/1988 11 30 1988 NA 0 0 12/1/1988 12 1 1988 NA 0 0 12/2/1988 12 2 1988 NA 0 0 12/3/1988 12 3 1988 NA 0 0 12/4/1988 12 4 1988 NA 0 0 12/5/1988 12 5 1988 NA 0 0 12/6/1988 12 6 1988 NA 0 0 12/7/1988 12 7 1988 NA 0 0 12/8/1988 12 8 1988 NA 0 0 12/9/1988 12 9 1988 NA 0 0 12/10/1988 12 10 1988 NA 0 0 12/11/1988 12 11 1988 NA 0 0 12/12/1988 12 12 1988 NA 0 0 12/13/1988 12 13 1988 NA 0.03 1 12/14/1988 12 14 1988 NA 0 0 12/15/1988 12 15 1988 NA 0 0 12/16/1988 12 16 1988 NA 0 0 12/17/1988 12 17 1988 NA 0 2 12/18/1988 12 18 1988 NA 0 0 12/19/1988 12 19 1988 NA 0 0 12/20/1988 12 20 1988 NA 0.07 0 12/21/1988 12 21 1988 NA 0.02 0 12/22/1988 12 22 1988 NA 0 0 12/23/1988 12 23 1988 NA 1.3 0 12/24/1988 12 24 1988 NA 0 0 12/25/1988 12 25 1988 NA 0 0 12/26/1988 12 26 1988 NA 0 0 12/27/1988 12 27 1988 NA 0.85 3 12/28/1988 12 28 1988 NA 0.37 3 12/29/1988 12 29 1988 NA 0 0 12/30/1988 12 30 1988 NA 0 0 12/31/1988 12 31 1988 NA 0 0 1/1/1989 1 1 1989 NA 0 0 1/2/1989 1 2 1989 NA 0 0 1/3/1989 1 3 1989 NA 0 0 1/4/1989 1 4 1989 NA 0 0 1/5/1989 1 5 1989 NA 0 0 1/6/1989 1 6 1989 NA 0.54 0 1/7/1989 1 7 1989 NA 0 0 1/8/1989 1 8 1989 NA 0.08 0 1/9/1989 1 9 1989 NA 0 0 1/10/1989 1 10 1989 NA 0 0 1/11/1989 1 11 1989 NA 0 0 1/12/1989 1 12 1989 NA 0 0 1/13/1989 1 13 1989 NA 0 0 1/14/1989 1 14 1989 NA 0 0 1/15/1989 1 15 1989 NA 0.04 1 1/16/1989 1 16 1989 NA 0 0 1/17/1989 1 17 1989 NA 0 0 1/18/1989 1 18 1989 NA 0 0 1/19/1989 1 19 1989 NA 0 0 1/20/1989 1 20 1989 NA 0 0 1/21/1989 1 21 1989 NA 0 0 1/22/1989 1 22 1989 NA 0 0 1/23/1989 1 23 1989 NA 0 0 1/24/1989 1 24 1989 NA 0 0 1/25/1989 1 25 1989 NA 0 0 1/26/1989 1 26 1989 NA 0.15 0 1/27/1989 1 27 1989 NA 0 0 1/28/1989 1 28 1989 NA 0 0 1/29/1989 1 29 1989 NA 0 0 1/30/1989 1 30 1989 NA 0 0 1/31/1989 1 31 1989 NA 0 0 2/1/1989 2 1 1989 NA 0 0 2/2/1989 2 2 1989 NA 0 0 2/3/1989 2 3 1989 NA 0.01 0 2/4/1989 2 4 1989 NA 0 0 2/5/1989 2 5 1989 NA 0.28 4 2/6/1989 2 6 1989 NA 0.21 3 2/7/1989 2 7 1989 NA 0 0 2/8/1989 2 8 1989 NA 0 0 2/9/1989 2 9 1989 NA 0 0 2/10/1989 2 10 1989 NA 0 0 2/11/1989 2 11 1989 NA 0 0 2/12/1989 2 12 1989 NA 0 0 2/13/1989 2 13 1989 NA 0.26 1 2/14/1989 2 14 1989 NA 0 0 2/15/1989 2 15 1989 NA 0.04 0 2/16/1989 2 16 1989 NA 0.03 1 2/17/1989 2 17 1989 NA 0 0 2/18/1989 2 18 1989 NA 0 0 2/19/1989 2 19 1989 NA 0 0 2/20/1989 2 20 1989 NA 0 0 2/21/1989 2 21 1989 NA 0.21 2 2/22/1989 2 22 1989 NA 0 0 2/23/1989 2 23 1989 NA 0 0 2/24/1989 2 24 1989 NA 0 0 2/25/1989 2 25 1989 NA 0 0 2/26/1989 2 26 1989 NA 0 0 2/27/1989 2 27 1989 NA 0 0 2/28/1989 2 28 1989 NA 0 0 3/1/1989 3 1 1989 1 0 0 3/2/1989 3 2 1989 NA 0 0 3/3/1989 3 3 1989 NA 0 0 3/4/1989 3 4 1989 NA 0 0 3/5/1989 3 5 1989 NA 0.34 0 3/6/1989 3 6 1989 NA 0 0 3/7/1989 3 7 1989 NA 0 0 3/8/1989 3 8 1989 NA 0 0 3/9/1989 3 9 1989 NA 0 0 3/10/1989 3 10 1989 NA 0 0 3/11/1989 3 11 1989 NA 0 0 3/12/1989 3 12 1989 NA 0 0 3/13/1989 3 13 1989 NA 0 0 3/14/1989 3 14 1989 NA 0 0 3/15/1989 3 15 1989 NA 0 0 3/16/1989 3 16 1989 NA 0 0 3/17/1989 3 17 1989 NA 0 0 3/18/1989 3 18 1989 NA 0.02 0 3/19/1989 3 19 1989 NA 0 0 3/20/1989 3 20 1989 NA 0 0 3/21/1989 3 21 1989 NA 0 0 3/22/1989 3 22 1989 NA 0 0 3/23/1989 3 23 1989 NA 0 0 3/24/1989 3 24 1989 NA 0 0 3/25/1989 3 25 1989 NA 0 0 3/26/1989 3 26 1989 NA 0 0 3/27/1989 3 27 1989 NA 0 0 3/28/1989 3 28 1989 NA 0.02 0 3/29/1989 3 29 1989 NA 0.81 0 3/30/1989 3 30 1989 NA 0 0 3/31/1989 3 31 1989 NA 0 0 4/1/1989 4 1 1989 NA 0 0 4/2/1989 4 2 1989 NA 0.05 0 4/3/1989 4 3 1989 NA 0.81 0 4/4/1989 4 4 1989 NA 0.49 0 4/5/1989 4 5 1989 NA 0 0 4/6/1989 4 6 1989 NA 0 0 4/7/1989 4 7 1989 NA 0 0 4/8/1989 4 8 1989 NA 0 0 4/9/1989 4 9 1989 NA 0.26 0 4/10/1989 4 10 1989 NA 0 0 4/11/1989 4 11 1989 NA 0 0 4/12/1989 4 12 1989 NA 0 0 4/13/1989 4 13 1989 NA 0 0 4/14/1989 4 14 1989 NA 0 0 4/15/1989 4 15 1989 NA 0 0 4/16/1989 4 16 1989 NA 0 0 4/17/1989 4 17 1989 NA 0.27 0 4/18/1989 4 18 1989 NA 0.04 0 4/19/1989 4 19 1989 NA 0 0 4/20/1989 4 20 1989 NA 0 0 4/21/1989 4 21 1989 NA 0 0 4/22/1989 4 22 1989 NA 0 0 4/23/1989 4 23 1989 NA 0 0 4/24/1989 4 24 1989 NA 0 0 4/25/1989 4 25 1989 NA 0 0 4/26/1989 4 26 1989 NA 0 0 4/27/1989 4 27 1989 NA 0.23 0 4/28/1989 4 28 1989 NA 0.28 0 4/29/1989 4 29 1989 NA 0 0 4/30/1989 4 30 1989 NA 0 0 5/1/1989 5 1 1989 NA 0 0 5/2/1989 5 2 1989 NA 0 0 5/3/1989 5 3 1989 0 0.28 0 5/4/1989 5 4 1989 NA 0 0 5/5/1989 5 5 1989 NA 0.06 0 5/6/1989 5 6 1989 NA 0 0 5/7/1989 5 7 1989 NA 0 0 5/8/1989 5 8 1989 NA 0 0 5/9/1989 5 9 1989 NA 0.42 0 5/10/1989 5 10 1989 NA 0.02 0 5/11/1989 5 11 1989 NA 0 0 5/12/1989 5 12 1989 NA 0 0 5/13/1989 5 13 1989 NA 0 0 5/14/1989 5 14 1989 NA 0 0 5/15/1989 5 15 1989 NA 0 0 5/16/1989 5 16 1989 NA 0 0 5/17/1989 5 17 1989 NA 0 0 5/18/1989 5 18 1989 NA 0 0 5/19/1989 5 19 1989 NA 0.05 0 5/20/1989 5 20 1989 NA 1.17 0 5/21/1989 5 21 1989 NA 0 0 5/22/1989 5 22 1989 NA 0 0 5/23/1989 5 23 1989 NA 0.03 0 5/24/1989 5 24 1989 NA 0 0 5/25/1989 5 25 1989 NA 0.21 0 5/26/1989 5 26 1989 NA 0.37 0 5/27/1989 5 27 1989 NA 0 0 5/28/1989 5 28 1989 NA 0 0 5/29/1989 5 29 1989 NA 0 0 5/30/1989 5 30 1989 NA 1.5 0 5/31/1989 5 31 1989 NA 0.14 0 6/1/1989 6 1 1989 1 0.97 0 6/2/1989 6 2 1989 NA 1.04 0 6/3/1989 6 3 1989 NA 0 0 6/4/1989 6 4 1989 NA 0.25 0 6/5/1989 6 5 1989 NA 0 0 6/6/1989 6 6 1989 NA 0 0 6/7/1989 6 7 1989 NA 0 0 6/8/1989 6 8 1989 NA 0 0 6/9/1989 6 9 1989 NA 0 0 6/10/1989 6 10 1989 NA 0 0 6/11/1989 6 11 1989 NA 0 0 6/12/1989 6 12 1989 NA 0.32 0 6/13/1989 6 13 1989 NA 0.16 0 6/14/1989 6 14 1989 NA 0 0

How I would like my data to look once it is completed:

date month day year samp precip snow 2/24/1989 2 24 1989 NA 0 0 2/25/1989 2 25 1989 NA 0 0 2/26/1989 2 26 1989 NA 0 0 2/27/1989 2 27 1989 NA 0 0 2/28/1989 2 28 1989 NA 0 0 3/1/1989 3 1 1989 1 0 0 4/28/1989 4 28 1989 NA 0.28 0 4/29/1989 4 29 1989 NA 0 0 4/30/1989 4 30 1989 NA 0 0 5/1/1989 5 1 1989 NA 0 0 5/2/1989 5 2 1989 NA 0 0 5/3/1989 5 3 1989 0 0.28 0 5/27/1989 5 27 1989 NA 0 0 5/28/1989 5 28 1989 NA 0 0 5/29/1989 5 29 1989 NA 0 0 5/30/1989 5 30 1989 NA 1.5 0 5/31/1989 5 31 1989 NA 0.14 0 6/1/1989 6 1 1989 1 0.97 0


回答1:


Here is a solution using which(!is.na()) and purrr::map. These are good sources to learn more about the tidyverse and purrr.

library(tidyverse)

str(dat)
#> Classes 'tbl_df', 'tbl' and 'data.frame':    216 obs. of  7 variables:
#>  $ date  : chr  "11/11/1988" "11/12/1988" "11/13/1988" "11/14/1988" ...
#>  $ month : int  11 11 11 11 11 11 11 11 11 11 ...
#>  $ day   : int  11 12 13 14 15 16 17 18 19 20 ...
#>  $ year  : int  1988 1988 1988 1988 1988 1988 1988 1988 1988 1988 ...
#>  $ samp  : int  NA NA NA NA NA NA NA NA NA NA ...
#>  $ precip: num  0 0 0.55 0 0 0.52 0 0 0 0.39 ...
#>  $ snow  : int  0 0 0 0 0 0 0 0 0 0 ...

# Extra: convert date from character to date format
dat <- dat %>% 
  mutate(date = as.Date(date, "%m/%d/%Y"))

Find the row locations at which samp is not NA

idx <- which(!is.na(dat$samp))
idx
#> [1] 111 174 203

Next we loop through these row indices then extract values 5 days prior to them

idx %>% 
  map(. , function(x) dat[(x-5):(x), ])

#> [[1]]
#> # A tibble: 6 x 7
#>   date       month   day  year  samp precip  snow
#>   <date>     <int> <int> <int> <int>  <dbl> <int>
#> 1 1989-02-24     2    24  1989    NA     0.     0
#> 2 1989-02-25     2    25  1989    NA     0.     0
#> 3 1989-02-26     2    26  1989    NA     0.     0
#> 4 1989-02-27     2    27  1989    NA     0.     0
#> 5 1989-02-28     2    28  1989    NA     0.     0
#> 6 1989-03-01     3     1  1989     1     0.     0
#> 
#> [[2]]
#> # A tibble: 6 x 7
#>   date       month   day  year  samp precip  snow
#>   <date>     <int> <int> <int> <int>  <dbl> <int>
#> 1 1989-04-28     4    28  1989    NA  0.280     0
#> 2 1989-04-29     4    29  1989    NA  0.        0
#> 3 1989-04-30     4    30  1989    NA  0.        0
#> 4 1989-05-01     5     1  1989    NA  0.        0
#> 5 1989-05-02     5     2  1989    NA  0.        0
#> 6 1989-05-03     5     3  1989     0  0.280     0
#> 
#> [[3]]
#> # A tibble: 6 x 7
#>   date       month   day  year  samp precip  snow
#>   <date>     <int> <int> <int> <int>  <dbl> <int>
#> 1 1989-05-27     5    27  1989    NA  0.        0
#> 2 1989-05-28     5    28  1989    NA  0.        0
#> 3 1989-05-29     5    29  1989    NA  0.        0
#> 4 1989-05-30     5    30  1989    NA  1.50      0
#> 5 1989-05-31     5    31  1989    NA  0.140     0
#> 6 1989-06-01     6     1  1989     1  0.970     0

If we want the result in a data frame

idx %>% 
  map_df(. , function(x) dat[(x-5):(x), ])

#> # A tibble: 18 x 7
#>    date       month   day  year  samp precip  snow
#>    <date>     <int> <int> <int> <int>  <dbl> <int>
#>  1 1989-02-24     2    24  1989    NA  0.        0
#>  2 1989-02-25     2    25  1989    NA  0.        0
#>  3 1989-02-26     2    26  1989    NA  0.        0
#>  4 1989-02-27     2    27  1989    NA  0.        0
#>  5 1989-02-28     2    28  1989    NA  0.        0
#>  6 1989-03-01     3     1  1989     1  0.        0
#>  7 1989-04-28     4    28  1989    NA  0.280     0
#>  8 1989-04-29     4    29  1989    NA  0.        0
#>  9 1989-04-30     4    30  1989    NA  0.        0
#> 10 1989-05-01     5     1  1989    NA  0.        0
#> 11 1989-05-02     5     2  1989    NA  0.        0
#> 12 1989-05-03     5     3  1989     0  0.280     0
#> 13 1989-05-27     5    27  1989    NA  0.        0
#> 14 1989-05-28     5    28  1989    NA  0.        0
#> 15 1989-05-29     5    29  1989    NA  0.        0
#> 16 1989-05-30     5    30  1989    NA  1.50      0
#> 17 1989-05-31     5    31  1989    NA  0.140     0
#> 18 1989-06-01     6     1  1989     1  0.970     0

More compacted form where you replace function(x) & x with "~" & "."

idx %>% 
  map_df(~ dat[(. -5):(.), ])

Created on 2018-03-12 by the reprex package (v0.2.0).



来源:https://stackoverflow.com/questions/49242430/selecting-a-specific-range-of-days-prior-to-event-in-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!