as.Date produces unexpected result in a sequence of week-based dates

前端 未结 3 1540
难免孤独
难免孤独 2020-12-11 16:55

I am working on the transformation of week based dates to month based dates.

When checking my work, I found the following problem in my data which is the result of a

相关标签:
3条回答
  • 2020-12-11 17:05

    As @lmo said in the comments, %u stands for the weekdays as a decimal number (1–7, with Monday as 1) and %U stands for the week of the year as decimal number (00–53) using Sunday as the first day. Thus, as.Date("2016-50-7", format = "%Y-%U-%u") will result in "2016-12-11".

    However, if that should give "2016-12-18", then you should use a week format that has also Monday as starting day. According to the documentation of ?strptime you would expect that the format "%Y-%V-%u" thus gives the correct output, where %V stands for the week of the year as decimal number (01–53) with monday as the first day.

    Unfortunately, it doesn't:

    > as.Date("2016-50-7", format = "%Y-%V-%u")
    [1] "2016-01-18"
    

    However, at the end of the explanation of %V it sais "Accepted but ignored on input" meaning that it won't work.

    You can circumvent this behavior as follows to get the correct dates:

    # create a vector of dates
    d <- c("2016-50-4","2016-50-5","2016-50-6","2016-50-7", "2016-51-1")
    
    # convert to the correct dates
    as.Date(paste0(substr(d,1,8), as.integer(substring(d,9))-1), "%Y-%U-%w") + 1
    

    which gives:

    [1] "2016-12-15" "2016-12-16" "2016-12-17" "2016-12-18" "2016-12-19"
    
    0 讨论(0)
  • 2020-12-11 17:06

    The issue is because for %u, 1 is Monday and 7 is Sunday of the week. The problem is further complicated by the fact that %U assumes week begins on Sunday.

    For the given input and expected behavior of format = "%Y-%U-%u", the output of line 4 is consistent with the output of previous 3 lines.

    That is, if you want to use format = "%Y-%U-%u", you should pre-process your input. In this case, the fourth line would have to be as.Date("2016-51-7", format = "%Y-%U-%u") as revealed by

    format(as.Date("2016-12-18"), "%Y-%U-%u")
    # "2016-51-7"
    

    Instead, you are currently passing "2016-50-7".

    Better way of doing it might be to use the approach suggested in Uwe Block's answer. Since you are happy with "2016-50-4" being transformed to "2016-12-15", I suspect in your raw data, Monday is counted as 1 too. You could also create a custom function that changes the value of %U to count the week number as if week begins on Monday so that the output is as you expected.

    #Function to change value of %U so that the week begins on Monday
    pre_process = function(x, delim = "-"){
        y = unlist(strsplit(x,delim))
        # If the last day of the year is 7 (Sunday for %u),
        # add 1 to the week to make it the week 00 of the next year
        # I think there might be a better solution for this
        if (y[2] == "53" & y[3] == "7"){
            x = paste(as.integer(y[1])+1,"00",y[3],sep = delim)
        } else if (y[3] == "7"){
        # If the day is 7 (Sunday for %u), add 1 to the week 
            x = paste(y[1],as.integer(y[2])+1,y[3],sep = delim)
        }
        return(x)
    }
    

    And usage would be

    as.Date(pre_process("2016-50-7"), format = "%Y-%U-%u")
    # [1] "2016-12-18"
    

    I'm not quite sure how to handle when the year ends on a Sunday.

    0 讨论(0)
  • 2020-12-11 17:11

    Working with week of the year can become very tricky. You may try to convert the dates using the ISOweek package:

    # create date strings in the format given by the OP
    wd <- c("2016-50-4","2016-50-5","2016-50-6","2016-50-7", "2016-51-1", "2016-52-7")
    # convert to "normal" dates
    ISOweek::ISOweek2date(stringr::str_replace(wd, "-", "-W"))
    

    The result

    #[1] "2016-12-15" "2016-12-16" "2016-12-17" "2016-12-18" "2016-12-19" "2017-01-01"
    

    is of class Date.

    Note that the ISO week-based date format is yyyy-Www-d with a capital W preceeding the week number. This is required to distinguish it from the standard month-based date format yyyy-mm-dd.

    So, in order to convert the date strings provided by the OP using ISOweek2date() it is necessary to insert a W after the first hyphen which is accomplished by replacing the first - by -W in each string.

    Also note that ISO weeks start on Monday and the days of the week are numbered 1 to 7. The year which belongs to an ISO week may differ from the calendar year. This can be seen from the sample dates above where the week-based date 2016-W52-7 is converted to 2017-01-01.

    About the ISOweek package

    Back in 2011, the %G, %g, %u, and %V format specifications weren't available to strptime() in the Windows version of R. This was annoying as I had to prepare weekly reports including week-on-week comparisons. I spent hours to find a solution for dealing with ISO weeks, ISO weekdays, and ISO years. Finally, I ended up creating the ISOweek package and publishing it on CRAN. Today, the package still has its merits as the aforementioned formats are ignored on input (see ?strptime for details).

    0 讨论(0)
提交回复
热议问题