I am working on the transformation of week based dates to month based dates.
When checking my work, I found the following problem in my data which is the result of a
As @lmo said in the comments, %u
stands for the weekdays as a decimal number (1–7, with Monday as 1) and %U
stands for the week of the year as decimal number (00–53) using Sunday as the first day. Thus, as.Date("2016-50-7", format = "%Y-%U-%u")
will result in "2016-12-11"
.
However, if that should give "2016-12-18"
, then you should use a week format that has also Monday as starting day. According to the documentation of ?strptime
you would expect that the format "%Y-%V-%u"
thus gives the correct output, where %V
stands for the week of the year as decimal number (01–53) with monday as the first day.
Unfortunately, it doesn't:
> as.Date("2016-50-7", format = "%Y-%V-%u")
[1] "2016-01-18"
However, at the end of the explanation of %V
it sais "Accepted but ignored on input" meaning that it won't work.
You can circumvent this behavior as follows to get the correct dates:
# create a vector of dates
d <- c("2016-50-4","2016-50-5","2016-50-6","2016-50-7", "2016-51-1")
# convert to the correct dates
as.Date(paste0(substr(d,1,8), as.integer(substring(d,9))-1), "%Y-%U-%w") + 1
which gives:
[1] "2016-12-15" "2016-12-16" "2016-12-17" "2016-12-18" "2016-12-19"
The issue is because for %u
, 1
is Monday
and 7
is Sunday
of the week. The problem is further complicated by the fact that %U
assumes week begins on Sunday.
For the given input and expected behavior of format = "%Y-%U-%u"
, the output of line 4 is consistent with the output of previous 3 lines.
That is, if you want to use format = "%Y-%U-%u"
, you should pre-process your input. In this case, the fourth line would have to be as.Date("2016-51-7", format = "%Y-%U-%u")
as revealed by
format(as.Date("2016-12-18"), "%Y-%U-%u")
# "2016-51-7"
Instead, you are currently passing "2016-50-7"
.
Better way of doing it might be to use the approach suggested in Uwe Block's answer. Since you are happy with "2016-50-4"
being transformed to "2016-12-15"
, I suspect in your raw data, Monday is counted as 1
too. You could also create a custom function that changes the value of %U
to count the week number as if week begins on Monday so that the output is as you expected.
#Function to change value of %U so that the week begins on Monday
pre_process = function(x, delim = "-"){
y = unlist(strsplit(x,delim))
# If the last day of the year is 7 (Sunday for %u),
# add 1 to the week to make it the week 00 of the next year
# I think there might be a better solution for this
if (y[2] == "53" & y[3] == "7"){
x = paste(as.integer(y[1])+1,"00",y[3],sep = delim)
} else if (y[3] == "7"){
# If the day is 7 (Sunday for %u), add 1 to the week
x = paste(y[1],as.integer(y[2])+1,y[3],sep = delim)
}
return(x)
}
And usage would be
as.Date(pre_process("2016-50-7"), format = "%Y-%U-%u")
# [1] "2016-12-18"
I'm not quite sure how to handle when the year ends on a Sunday.
Working with week of the year can become very tricky. You may try to convert the dates using the ISOweek
package:
# create date strings in the format given by the OP
wd <- c("2016-50-4","2016-50-5","2016-50-6","2016-50-7", "2016-51-1", "2016-52-7")
# convert to "normal" dates
ISOweek::ISOweek2date(stringr::str_replace(wd, "-", "-W"))
The result
#[1] "2016-12-15" "2016-12-16" "2016-12-17" "2016-12-18" "2016-12-19" "2017-01-01"
is of class Date
.
Note that the ISO week-based date format is yyyy-Www-d
with a capital W
preceeding the week number. This is required to distinguish it from the standard month-based date format yyyy-mm-dd
.
So, in order to convert the date strings provided by the OP using ISOweek2date()
it is necessary to insert a W
after the first hyphen which is accomplished by replacing the first -
by -W
in each string.
Also note that ISO weeks start on Monday and the days of the week are numbered 1 to 7. The year which belongs to an ISO week may differ from the calendar year. This can be seen from the sample dates above where the week-based date 2016-W52-7
is converted to 2017-01-01
.
ISOweek
packageBack in 2011, the %G
, %g
, %u
, and %V
format specifications weren't available to strptime()
in the Windows version of R. This was annoying as I had to prepare weekly reports including week-on-week comparisons. I spent hours to find a solution for dealing with ISO weeks, ISO weekdays, and ISO years. Finally, I ended up creating the ISOweek
package and publishing it on CRAN. Today, the package still has its merits as the aforementioned formats are ignored on input (see ?strptime
for details).