I have these three intervals defined:
YEAR_1 <- interval(ymd(\'2002-09-01\'), ymd(\'2003-08-31\'))
YEAR_2 <- interval(ymd(\'2003-09-01\'), ymd(\'20
Everybody has their favourite tool for this, mine happens to be data.table because of what it refers to as its dt[i, j, by]
logic.
library(data.table)
dt <- data.table(date = as.IDate(pt))
dt[, YR := 0.0 ] # I am using a numeric for year here...
dt[ date >= as.IDate("2002-09-01") & date <= as.IDate("2003-08-31"), YR := 1 ]
dt[ date >= as.IDate("2003-09-01") & date <= as.IDate("2004-08-31"), YR := 2 ]
dt[ date >= as.IDate("2004-09-01") & date <= as.IDate("2005-08-31"), YR := 3 ]
I create a data.table
object, converting your times to date for later comparison. I then set up a new column, defaulting to one.
We then execute three conditional statements: for each of the three intervals (which I just create by hand using the endpoints), we set the YR
value to 1, 2 or 3.
This does have the desired effect as we can see from
R> print(dt, topn=5, nrows=10)
date YR
1: 2003-06-11 1
2: 2004-08-11 2
3: 2004-06-03 2
4: 2004-01-20 2
5: 2005-02-25 3
---
96: 2002-08-07 0
97: 2004-02-04 2
98: 2006-04-10 0
99: 2005-03-21 3
100: 2003-12-01 2
R> table(dt[, YR])
0 1 2 3
26 31 31 12
R>
One could have done this also simply by computing date differences and truncating down, but it is also nice to be a little explicit at times.
Edit: A more generic form just uses arithmetic on the dates:
R> dt[, YR2 := trunc(as.numeric(difftime(as.Date(date),
+ as.Date("2001-09-01"),
+ unit="days"))/365.25)]
R> table(dt[, YR2])
0 1 2 3 4 5 6 7 9
7 31 31 12 9 5 1 2 1
R>
This does the job in one line.