Adding the values of second column based on date and time of first column

≯℡__Kan透↙ 提交于 2019-12-12 22:04:12

问题


I have a data frame with 2 variables. the first column "X" represents date and time with format dd/mm/yyyy hh:mm, the values in the second column "Y" are the electricity meter reading which are taken each after 5 minutes. Now I want to add the values of each half an hour. For instance

X                Y  
13/12/2014 12:00 1   
13/12/2014 12:05 2  
13/12/2014 12:10 1  
13/12/2014 12:15 2  
13/12/2014 12:20 2  
13/12/2014 12:25 1

At the end i want to present a result as:

13/12/2014 12:00 9  
13/12/2014 12:30 12  

and so on...


回答1:


Here's an alternative approach which actually takes X in count (as per OP comment).

First, we will make sure X is of proper POSIXct format so we could manipulate it correctly (I'm using the data.table package here for convenience)

library(data.table)
setDT(df)[, X := as.POSIXct(X, format = "%d/%m/%Y %R")]

Then, we will aggregate per cumulative minutes instances of 00 or 30 within X while summing Y and extracting the first value of X per each group. I've made a more complicated data set in order illustrate more complicated scenarios (see below)

df[order(X), .(X = X[1L], Y = sum(Y)), by = cumsum(format(X, "%M") %in% c("00", "30"))]
#    cumsum                   X Y
# 1:      0 2014-12-13 12:10:00 6
# 2:      1 2014-12-13 12:30:00 6
# 3:      2 2014-12-13 13:00:00 3

Data

df <- read.table(text = "X Y  
'13/12/2014 12:10' 1  
'13/12/2014 12:15' 2  
'13/12/2014 12:20' 2  
'13/12/2014 12:25' 1
'13/12/2014 12:30' 1
'13/12/2014 12:35' 1
'13/12/2014 12:40' 1
'13/12/2014 12:45' 1
'13/12/2014 12:50' 1
'13/12/2014 12:55' 1
'13/12/2014 13:00' 1
'13/12/2014 13:05' 1
'13/12/2014 13:10' 1", header = TRUE)

Some explanations

  • The by expression:
    • format(X, "%M") gets the minutes out of X (see ?strptime)
    • Next step is check if they match 00 or 30 (using %in%)
    • cumsum separates these matched values into separate groups which we aggregate by by putting this expression into the by statement (see ?data.table)
  • The jth epression
    • (X = X[1L], Y = sum(Y)) is simply getting the first value of X per each group and the sum of Y per each group.
  • The ith expression
    • I've added order(X) in order to make sure the data set is properly ordered by date (one of the main reasons I've converted X to proper POSIXct format)

For a better understanding on how data.table works, see some tutorials here




回答2:


t1 <- tapply(df$Y, as.numeric(as.POSIXct(df$X, format = '%d/%m/%Y %H:%M')) %/% 1800, sum)
data.frame(time = as.POSIXct(as.numeric(names(t1))*1800 + 1800, origin = '1970-01-01'), t1)

t1 groups the values using integer division by 1800 (30 minutes)




回答3:


Considering your data frame as df. You can try -

unname(tapply(df$Y, (seq_along(df$Y)-1) %/% 6, sum))


来源:https://stackoverflow.com/questions/32097282/adding-the-values-of-second-column-based-on-date-and-time-of-first-column

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!