R: How to get the Week number of the month

前端 未结 9 828
孤街浪徒
孤街浪徒 2020-11-30 04:05

I am new in R.
I want the week number of the month, which the date belongs to.

By using the following code:

>CurrentDate<-Sys.Date()
>We         


        
9条回答
  •  离开以前
    2020-11-30 04:55

    Issue Overview

    It was difficult to tell which answers worked, so I built my own function nth_week and tested it against the others.

    The issue that's leading to most of the answers being incorrect is this:

    • The first week of a month is often a short-week
    • Same with the last week of the month

    For example, October 1st 2019 is a Tuesday, so 6 days into October (which is a Sunday) is already the second week. Also, contiguous months often share the same week in their respective counts, meaning that the last week of the prior month is commonly also the first week of the current month. So, we should expect a week count higher than 52 per year and some months that contain a span of 6 weeks.

    Results Comparison

    Here's a table showing examples where some of the above suggested algorithms go awry:

    DATE            Tori user206 Scri Klev Stringi Grot Frei Vale epi iso coni
    Fri-2016-01-01    1     1      1   1      5      1    1    1    1   1   1
    Sat-2016-01-02    1     1      1   1      1      1    1    1    1   1   1
    Sun-2016-01-03    2     1      1   1      1      2    2    1  -50   1   2
    Mon-2016-01-04    2     1      1   1      2      2    2    1  -50 -51   2
    ----
    Sat-2018-12-29    5     5      5   5      5      5    5    4    5   5   5
    Sun-2018-12-30    6     5      5   5      5      6    6    4  -46   5   6
    Mon-2018-12-31    6     5      5   5      6      6    6    4  -46 -46   6
    Tue-2019-01-01    1     1      1   1      6      1    1    1    1   1   1
    

    You can see that only Grothendieck, conighion, Freitas, and Tori are correct due to their treatment of partial week periods. I compared all days from year 100 to year 3000; there are no differences among those 4. (Stringi is probably correct for noting weekends as separate, incremented periods, but I didn't check to be sure; epiweek() and isoweek(), because of their intended uses, show some odd behavior near year-ends when using them for week incrementation.)

    Speed Comparison

    Below are the tests for efficiency between the implementations of: Tori, Grothendieck, Conighion, and Freitas

    # prep
    library(lubridate)
    library(tictoc)
    
    kepler<- ymd(15711227) # Kepler's birthday since it's a nice day and gives a long vector of dates
    some_dates<- seq(kepler, today(), by='day')
    
    
    # test speed of Tori algorithm
    tic(msg = 'Tori')
    Tori<- (5 + day(some_dates) + wday(floor_date(some_dates, 'month'))) %/% 7
    toc()
    Tori: 0.19 sec elapsed
    
    # test speed of Grothendieck algorithm
    wk <- function(x) as.numeric(format(x, "%U"))
    tic(msg = 'Grothendieck')
    Grothendieck<- (wk(some_dates) - wk(as.Date(cut(some_dates, "month"))) + 1)
    toc()
    Grothendieck: 1.99 sec elapsed
    
    # test speed of conighion algorithm
    tic(msg = 'conighion')
    weeknum <- as.integer( format(some_dates, format="%U") )
    mindatemonth <- as.Date( paste0(format(some_dates, "%Y-%m"), "-01") )
    weeknummin <- as.integer( format(mindatemonth, format="%U") ) # the number of the week of the first week within the month
    conighion <- weeknum - (weeknummin - 1) # this is as an integer
    toc()
    conighion: 2.42 sec elapsed
    
    # test speed of Freitas algorithm
    first_day_of_month_wday <- function(dx) {
       day(dx) <- 1
       wday(dx)
     }
    tic(msg = 'Freitas')
    Freitas<- ceiling((day(some_dates) + first_day_of_month_wday(some_dates) - 1) / 7)
    toc()
    Freitas: 0.97 sec elapsed
    



    Fastest correct algorithm by about at least 5X

    require(lubridate)

    (5 + day(some_dates) + wday(floor_date(some_dates, 'month'))) %/% 7

    # some_dates above is any vector of dates, like:
    some_dates<- seq(ymd(20190101), today(), 'day')
    



    Function Implementation

    I also wrote a generalized function for it that performs either month or year week counts, begins on a day you choose (i.e. say you want to start your week on Monday), labels output for easy checking, and is still extremely fast thanks to lubridate.

    nth_week<- function(dates = NULL,
                        count_weeks_in = c("month","year"),
                        begin_week_on = "Sunday"){
    
      require(lubridate)
    
      count_weeks_in<- tolower(count_weeks_in[1])
    
      # day_names and day_index are for beginning the week on a day other than Sunday
      # (this vector ordering matters, so careful about changing it)
      day_names<- c("Monday","Tuesday","Wednesday","Thursday","Friday","Saturday","Sunday")
    
      # index integer of first match
      day_index<- pmatch(tolower(begin_week_on),
                         tolower(day_names))[1]
    
    
      ### Calculate week index of each day
    
      if (!is.na(pmatch(count_weeks_in, "year"))) {
    
        # For year:
        # sum the day of year, index for day of week at start of year, and constant 5 
        #  then integer divide quantity by 7   
        # (explicit on package so lubridate and data.table don't fight)
        n_week<- (5 + 
                    lubridate::yday(dates) + 
                    lubridate::wday(floor_date(dates, 'year'), 
                                    week_start = day_index)
        ) %/% 7
    
      } else {
    
        # For month:
        # same algorithm as above, but for month rather than year
        n_week<- (5 + 
                    lubridate::day(dates) + 
                    lubridate::wday(floor_date(dates, 'month'), 
                                    week_start = day_index)
        ) %/% 7
    
      }
    
      # naming very helpful for review
      names(n_week)<- paste0(lubridate::wday(dates,T), '-', dates)
    
      n_week
    
    }
    



    Function Output

    # Example raw vector output: 
    some_dates<- seq(ymd(20190930), today(), by='day')
    nth_week(some_dates)
    
    Mon-2019-09-30 Tue-2019-10-01 Wed-2019-10-02 
                 5              1              1 
    Thu-2019-10-03 Fri-2019-10-04 Sat-2019-10-05 
                 1              1              1 
    Sun-2019-10-06 Mon-2019-10-07 Tue-2019-10-08 
                 2              2              2 
    Wed-2019-10-09 Thu-2019-10-10 Fri-2019-10-11 
                 2              2              2 
    Sat-2019-10-12 Sun-2019-10-13 
                 2              3 
    
    # Example tabled output:
    library(tidyverse)
    
    nth_week(some_dates) %>% 
      enframe('DATE','nth_week_default') %>% 
      cbind(some_year_day_options = as.vector(nth_week(some_dates, count_weeks_in = 'year', begin_week_on = 'Mon')))
    
                 DATE nth_week_default some_year_day_options
    1  Mon-2019-09-30                5                    40
    2  Tue-2019-10-01                1                    40
    3  Wed-2019-10-02                1                    40
    4  Thu-2019-10-03                1                    40
    5  Fri-2019-10-04                1                    40
    6  Sat-2019-10-05                1                    40
    7  Sun-2019-10-06                2                    40
    8  Mon-2019-10-07                2                    41
    9  Tue-2019-10-08                2                    41
    10 Wed-2019-10-09                2                    41
    11 Thu-2019-10-10                2                    41
    12 Fri-2019-10-11                2                    41
    13 Sat-2019-10-12                2                    41
    14 Sun-2019-10-13                3                    41
    

    Hope this work saves people the time of having to weed through all the responses to figure out which are correct.

提交回复
热议问题