R: calculate the number of occurrences of a specific event in a specified time future

坚强是说给别人听的谎言 提交于 2019-12-29 08:48:08

问题


my simplified data looks like this:

set.seed(1453); x = sample(0:1, 10, TRUE)
date = c('2016-01-01', '2016-01-05', '2016-01-07',  '2016-01-12',  '2016-01-16',  '2016-01-20',
             '2016-01-20',  '2016-01-25',  '2016-01-26',  '2016-01-31')


df = data.frame(x, date = as.Date(date))


df 
x       date
1 2016-01-01
0 2016-01-05
1 2016-01-07
0 2016-01-12
0 2016-01-16
1 2016-01-20
1 2016-01-20
0 2016-01-25
0 2016-01-26
1 2016-01-31

I'd like to calculate the number of occurrences for x == 1 within a specified time period, e.g. 14 and 30 days from the current date (but excluding the current entry, if it is x == 1. The desired output would look like this:

solution
x       date x_plus14 x_plus30
1 2016-01-01        1        3
0 2016-01-05        1        4
1 2016-01-07        2        3
0 2016-01-12        2        3
0 2016-01-16        2        3
1 2016-01-20        2        2
1 2016-01-20        1        1
0 2016-01-25        1        1
0 2016-01-26        1        1
1 2016-01-31        0        0

Ideally, I'd like this to be in dplyr, but it is not a must. Any ideas how to achieve this? Thanks a lot for your help!


回答1:


Adding another approach based on findInterval:

cs = cumsum(df$x) # cumulative number of occurences
data.frame(df, 
           plus14 = cs[findInterval(df$date + 14, df$date, left.open = TRUE)] - cs, 
           plus30 = cs[findInterval(df$date + 30, df$date, left.open = TRUE)] - cs)
#   x       date plus14 plus30
#1  1 2016-01-01      1      3
#2  0 2016-01-05      1      4
#3  1 2016-01-07      2      3
#4  0 2016-01-12      2      3
#5  0 2016-01-16      2      3
#6  1 2016-01-20      2      2
#7  1 2016-01-20      1      1
#8  0 2016-01-25      1      1
#9  0 2016-01-26      1      1
#10 1 2016-01-31      0      0



回答2:


Earlier I wasn't including the present date and so numbers didn't match.

library(data.table)
setDT(df)[, `:=`(x14 = sum(df$x[between(df$date, date, date + 14, incbounds = FALSE)]), 
                 x30 = sum(df$x[between(df$date, date, date + 30, incbounds = FALSE)])),
              by = date]

#     x       date x14 x30
#  1: 1 2016-01-01   1   3
#  2: 0 2016-01-05   1   4
#  3: 1 2016-01-07   2   3
#  4: 0 2016-01-12   2   3
#  5: 0 2016-01-16   2   3
#  6: 1 2016-01-20   1   1
#  7: 1 2016-01-20   1   1
#  8: 0 2016-01-25   1   1
#  9: 0 2016-01-26   1   1
# 10: 1 2016-01-31   0   0

Or a general solution that will work for any desired range

vec <- c(14, 30) # Specify desired ranges
setDT(df)[, paste0("x", vec) := 
            lapply(vec, function(i) sum(df$x[between(df$date, 
                                                     date, 
                                                     date + i, 
                                                     incbounds = FALSE)])),
            by = date]



回答3:


Here's my stab at it with some dplyr+purrr help. I got slightly different counts due to the <= and >= in the helper function x_next() if you adjust them properly i think you should be able to get what you want. hth.

library("tidyverse")
library("lubridate")
set.seed(1453)

x = sample(0:1, 10, TRUE)
dates = c('2016-01-01', '2016-01-05', '2016-01-07',  '2016-01-12',  '2016-01-16',  '2016-01-20',
         '2016-01-20',  '2016-01-25',  '2016-01-26',  '2016-01-31')

df = data_frame(x = x, dates = lubridate::as_date(dates))

# helper function to calculate the sum of xs in the next days_in_future
x_next <- function(d, days_in_future) {

  df %>% 
    # subset on days of interest
    filter(dates > d & dates <= d + days(days_in_future)) %>% 
    # sum up xs
    summarise(sum = sum(x)) %>% 
    # have to unlist them so that the (following) call to mutate works
    unlist(use.names=F)
  }

# mutate your df
df %>% 
  mutate(xplus14 = map(dates, x_next, 14),
         xplus30 = map(dates, x_next, 30))



回答4:


A concise dplyr and purrr solution:

library(tidyverse)

sample %>% 
  mutate(x_plus14 = map(date, ~sum(x == 1 & between(date, . + 1, . + 14))),
         x_plus30 = map(date, ~sum(x == 1 & between(date, . + 1, . + 30))))
   x       date x_plus14 x_plus30
1  1 2016-01-01        1        4
2  0 2016-01-05        1        4
3  1 2016-01-07        2        3
4  0 2016-01-12        2        3
5  0 2016-01-16        2        3
6  1 2016-01-20        1        1
7  1 2016-01-20        1        1
8  0 2016-01-25        1        1
9  0 2016-01-26        1        1
10 1 2016-01-31        0        0



回答5:


As other already mentioned, it is strange that you do not count the day from and you should avoid naming objects by names of functions (sample). However, the code bellow reproduce your desired output:

set.seed(1453); 
x = sample(0:1, 10, TRUE)
date = c('2016-01-01', '2016-01-05', '2016-01-07',  '2016-01-12',  '2016-01-16',  '2016-01-20',
             '2016-01-20',  '2016-01-25',  '2016-01-26',  '2016-01-31')


sample = data.frame(x = x, date = as.Date(sample$date))

getOccurences <- function(one_row, sample_data, date_range){
  one_date <- as.Date(one_row[2])
  sum(sample$x[sample_data$date > one_date & 
               sample_data$date < one_date + date_range])
}

sample$x_plus14 <- apply(sample,1,getOccurences, sample, 14)
sample$x_plus30 <- apply(sample,1,getOccurences, sample, 30)

sample

   x       date x_plus14 x_plus30
1  1 2016-01-01        1        3
2  0 2016-01-05        1        4
3  1 2016-01-07        2        3
4  0 2016-01-12        2        3
5  0 2016-01-16        2        3
6  1 2016-01-20        1        1
7  1 2016-01-20        1        1
8  0 2016-01-25        1        1
9  0 2016-01-26        1        1
10 1 2016-01-31        0        0


来源:https://stackoverflow.com/questions/41593453/r-calculate-the-number-of-occurrences-of-a-specific-event-in-a-specified-time-f

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!