Group rows in data frame based on time difference between consecutive rows

后端 未结 2 1177
爱一瞬间的悲伤
爱一瞬间的悲伤 2020-12-06 22:50

I have a data frame of this type

YEAR   MONTH  DAY  HOUR       LON      LAT

1860     10      3   13      -19.50   3.00          
1860     10      3   17             


        
相关标签:
2条回答
  • 2020-12-06 22:54

    Here is another possibility which groups rows where the time difference between consecutive rows is less than 4 days.

    # create date variable
    df$date <- with(df, as.Date(paste(YEAR, MONTH, DAY, sep = "-")))
    
    # calculate succesive differences between dates
    # and identify gaps larger than 4
    df$gap <- c(0, diff(df$date) > 4)
    
    # cumulative sum of 'gap' variable
    df$group <- cumsum(df$gap) + 1
    
    df    
    #    YEAR MONTH DAY HOUR   LON LAT       date gap group
    # 1  1860    10   3   13 -19.5   3 1860-10-03   0     1
    # 2  1860    10   3   17 -19.5   4 1860-10-03   0     1
    # 3  1860    10   3   21 -19.5   5 1860-10-03   0     1
    # 4  1860    10   5    5 -20.5   6 1860-10-05   0     1
    # 5  1860    10   5   13 -21.5   7 1860-10-05   0     1
    # 6  1860    10   5   17 -21.5   8 1860-10-05   0     1
    # 7  1860    10   6    1 -22.5   9 1860-10-06   0     1
    # 8  1860    10   6    5 -22.5  10 1860-10-06   0     1
    # 9  1860    12   5    9 -22.5  -7 1860-12-05   1     2
    # 10 1860    12   5   18 -23.5  -8 1860-12-05   0     2
    # 11 1860    12   5   22 -23.5  -9 1860-12-05   0     2
    # 12 1860    12   6    6 -24.5 -10 1860-12-06   0     2
    # 13 1860    12   6   10 -24.5 -11 1860-12-06   0     2
    # 14 1860    12   6   18 -24.5 -12 1860-12-06   0     2
    

    Disclaimer: the diff & cumsum part is inspired by this Q&A: How to partition a vector into groups of regular, consecutive sequences?.

    0 讨论(0)
  • 2020-12-06 23:16

    I would try something along these lines. Since you mention that you only need to figure out the subsetting logic, I haven't bothered to add the correlation coeff calculation.

    df$date <- as.Date(paste(df$YEAR,df$MONTH,df$DAY),'%Y %m %d')
    
    uniquedates <- unique(df$date)
    uniquedatesfourth <- uniquedates + 4
    
    for ( i in seq(length(uniquedates)))
    {
       tempsubset <- subset(df, date >= uniquedates[i] & date >= uniquedatesfourth[i])
       # operations on tempsubset
    }
    
    0 讨论(0)
提交回复
热议问题