Determine season from Date using lubridate in R

后端 未结 6 2056
刺人心
刺人心 2020-12-16 04:46

I have a very big dataset with a DateTime Column containing POSIXct-Values. I need to determine the season (Winter - Summer) based on the DateTime

相关标签:
6条回答
  • 2020-12-16 05:28

    I packaged @Lars Arne Jordanger's much more elegant approach into a function:

    getTwoSeasons <- function(input.date){
      numeric.date <- 100*month(input.date)+day(input.date)
      ## input Seasons upper limits in the form MMDD in the "break =" option:
      cuts <- base::cut(numeric.date, breaks = c(0,415,1015,1231)) 
      # rename the resulting groups (could've been done within cut(...levels=) if "Winter" wasn't double
      levels(cuts) <- c("Winter", "Summer","Winter")
      return(cuts)
    }
    

    Testing it on some sample data seems to work fine:

    getTwoSeasons(as.POSIXct("2016-01-01 12:00:00")+(0:365)*(60*60*24))
    
    0 讨论(0)
  • 2020-12-16 05:39

    And if you're interested in getting back four seasons, here's code to do that:

    library(lubridate)
    getSeason <- function(input.date){
      numeric.date <- 100*month(input.date)+day(input.date)
      ## input Seasons upper limits in the form MMDD in the "break =" option:
      cuts <- base::cut(numeric.date, breaks = c(0,319,0620,0921,1220,1231)) 
      # rename the resulting groups (could've been done within cut(...levels=) if "Winter" wasn't double
      levels(cuts) <- c("Winter","Spring","Summer","Fall","Winter")
      return(cuts)
    }
    

    Unit Test:

    getSeason(as.POSIXct("2016-01-01 12:00:00")+(0:365)*(60*60*24))
    
    0 讨论(0)
  • 2020-12-16 05:40

    After several hours of debugging I've found my mistake, and it's quite absurd really:

    If a season for a DateTimeValue was not found, apply returned list-object instead of a vector (this was the case when the DateTime value equalled 2000-12-31 00:00:00). Returning a list created an an overproportional increase in computation time and the described crashes. Here's a the fixed code:

    # input date and return 2 season
    getTwoSeasons <- function(input.date) {
      Winter1Start <- as.POSIXct("2000-01-01 00:00:00", tz = "UTC")
      Winter1End <- as.POSIXct("2000-04-15 23:59:59", tz = "UTC")
    
      SummerStart <- Winter1End + 1
      SummerEnd <- as.POSIXct("2000-10-15 23:59:59", tz = "UTC")
    
      Winter2Start <- SummerEnd + 1
      Winter2End <- as.POSIXct("2001-01-01 00:00:01", tz = "UTC")
    
      SeasonStart <- c(Winter1Start,SummerStart,Winter2Start)
      SeasonsEnd <- c(Winter1End,SummerEnd,Winter2End)
      Season_names <- factor(c("WinterHalf","SummerHalf","WinterHalf"))
    
      year(input.date) <- year(Winter1Start)
      attr(input.date, "tzone") <- attr(Winter1Start, "tzone")
    
      Season_selectStart <- vapply(X = SeasonStart,function(x,y){x <= input.date},FUN.VALUE = logical(length(input.date)),y = input.date)
      Season_selectEnd   <- vapply(X = SeasonsEnd,function(x,y){x > input.date},FUN.VALUE = logical(length(input.date)),y = input.date)
      Season_selectBoth  <- Season_selectStart & Season_selectEnd
      Season_return <- apply(Season_selectBoth,MARGIN = 1,function(x,y){y[x]}, y = Season_names)
      return(Season_return)
    }
    

    The "sub"-functions are now integrated in the main function and two sapply functions replaced with vapply.

    PS: There is still an issue with the timezone, since c() strips the timezone away. I'll update the code when I fix it.

    0 讨论(0)
  • 2020-12-16 05:46

    Use the POSXlt instead of POSXct.

    I made my own function depending on the definition of seasons that I am using. I created vectors named normal for a non-leap year and leap for leap year with each season name repeated the no. of times it appears starting from Jan 1. And created the following function.

    SEASON <- function(datee){
      
      datee <- as.POSIXlt(datee)
      season <- vector()
      normal <- rep(c("Winter","Spring","Summer","Monsoon","Autumn","Winter"), c(46,44,91,77,76,31))
      leap <- rep(c("Winter","Spring","Summer","Monsoon","Autumn","Winter"), c(46,45,91,77,76,31))
    
      
      if(leap_year(year(datee)) == FALSE){
        season <- normal[datee$yday+1]
      } else {
        season <- leap[datee$yday+1]
      }
      return(season)
    }
    

    Let's put it to test for some dataset.

    Dates <- seq(as.POSIXct("2000-01-01"), as.POSIXct("2010-01-01"), by= "day")
    sapply(Dates, SEASON)
    

    It works.

    0 讨论(0)
  • 2020-12-16 05:47

    For completeness, worth noting that lubridate now has a quarter (and a semester) function. quarter splits the year into fourths and semester into halves:

    library(lubridate)
    
    quarter(x, with_year = FALSE, fiscal_start = 1)
    semester(x, with_year = FALSE)
    

    For more, see: https://www.rdocumentation.org/packages/lubridate/versions/1.7.4/topics/quarter

    0 讨论(0)
  • The following strategy can also be used: The basic observation is that substr can extract the month and day information we need in order to decide if it's summer or winter. The idea is then to convert this to numbers of the form month.date, and the test for being summer then boils down to having a number larger than 4.15 but smaller than 10.16.

    The example below shows how this can be done when a vector of dates first are transformed into the alternative presentation described above, and then a vector that tells if it is summer "TRUE" or winter "FALSE" will be created based on this.

    DateTime <- as.POSIXct(x  = "2000-01-01 00:00:00",
                           tz = "UTC") +
        (0:1000)*(60*60*24)
    
    DateTime_2 <- as.numeric(paste(
        substr(x = DateTime,
               start = 6,
               stop = 7),
        substr(x = DateTime,
               start = 9,
               stop = 10),
        sep = "."))
    
    .season <- (DateTime_2 > 4.15) & (DateTime_2 < 10.16)
    
    0 讨论(0)
提交回复
热议问题