Determine season from Date using lubridate in R

后端 未结 6 2066
刺人心
刺人心 2020-12-16 04:46

I have a very big dataset with a DateTime Column containing POSIXct-Values. I need to determine the season (Winter - Summer) based on the DateTime

6条回答
  •  予麋鹿
    予麋鹿 (楼主)
    2020-12-16 05:40

    After several hours of debugging I've found my mistake, and it's quite absurd really:

    If a season for a DateTimeValue was not found, apply returned list-object instead of a vector (this was the case when the DateTime value equalled 2000-12-31 00:00:00). Returning a list created an an overproportional increase in computation time and the described crashes. Here's a the fixed code:

    # input date and return 2 season
    getTwoSeasons <- function(input.date) {
      Winter1Start <- as.POSIXct("2000-01-01 00:00:00", tz = "UTC")
      Winter1End <- as.POSIXct("2000-04-15 23:59:59", tz = "UTC")
    
      SummerStart <- Winter1End + 1
      SummerEnd <- as.POSIXct("2000-10-15 23:59:59", tz = "UTC")
    
      Winter2Start <- SummerEnd + 1
      Winter2End <- as.POSIXct("2001-01-01 00:00:01", tz = "UTC")
    
      SeasonStart <- c(Winter1Start,SummerStart,Winter2Start)
      SeasonsEnd <- c(Winter1End,SummerEnd,Winter2End)
      Season_names <- factor(c("WinterHalf","SummerHalf","WinterHalf"))
    
      year(input.date) <- year(Winter1Start)
      attr(input.date, "tzone") <- attr(Winter1Start, "tzone")
    
      Season_selectStart <- vapply(X = SeasonStart,function(x,y){x <= input.date},FUN.VALUE = logical(length(input.date)),y = input.date)
      Season_selectEnd   <- vapply(X = SeasonsEnd,function(x,y){x > input.date},FUN.VALUE = logical(length(input.date)),y = input.date)
      Season_selectBoth  <- Season_selectStart & Season_selectEnd
      Season_return <- apply(Season_selectBoth,MARGIN = 1,function(x,y){y[x]}, y = Season_names)
      return(Season_return)
    }
    

    The "sub"-functions are now integrated in the main function and two sapply functions replaced with vapply.

    PS: There is still an issue with the timezone, since c() strips the timezone away. I'll update the code when I fix it.

提交回复
热议问题