how to sum data by month and store them in the first day of the month

♀尐吖头ヾ 提交于 2019-12-24 15:12:14

问题


Lets say I have a data frame as follows:

gageID    date        flow_cms
1011000 1937-02-19   25.768334
1011000 1937-02-20   24.918828
1011000 1937-02-21   24.069322

I want to aggregate the rows that have the same month summing the flow, and store the result into a new data value: the first day of every month; in order to obtain the following output:

gageID  date  sum_monthly_flow
1011000 1937-02-01  500.2222
1011000 1937-03-01  589.222

I'm using this line:

>rowsum(qfile$flow_cms, format(qfile$date, '%Y-%m-01'))

and I obtain the right sum, but I want also to reduce the record days in a unique day: the first of every month! with the strip shows above, R cannot recognize the left coloumn as data (or date).

Help would be very much appreciated!


回答1:


First make sure your "date" column is properly formatted as a date object in R:

qfile$date <- as.Date(qfile$date, format = "%Y-%m-%d")

Then we can use format to extract the month and year, and group_by that for a sum and take the first date:

library(dplyr)
qfile %>% mutate(monthyear = as.character(format(date, "%m-%Y"))) %>%
          arrange(date) %>% 
          group_by(monthyear) %>%
          summarise(date=date[1], flow = sum(flow_cms))

This will give you the first taken record for each month in the data.




回答2:


Another solution:

df
   gageID       date  flow_cms
1 1011000 1937-02-19  25.76833
2 1011000 1937-02-20  24.91883
3 1011000 1937-02-21  24.06932
4 1011000 1937-03-20  12.22200
5 1011000 1937-03-01 589.22200

df1 = aggregate(list(flow_cms=df$flow_cms),by=list(dates=cut(as.POSIXct(df$date),"month")),sum)

df1
       dates  flow_cms
1 1937-02-01  74.75648
2 1937-03-01 601.44400



回答3:


Using data.table and lubridate you can try:

library(data.table)
setDT(qfile)[, lapply(.SD, sum), by = .(gageID, date = lubridate::floor_date(date, "month"))]
    gageID       date  flow_cms
1: 1011000 1937-02-01  74.75648
2: 1011000 1937-03-01 601.44400

Note that it is assumed that date is already of class Date and that gageID is another grouping parameter.

Alternatively, the call to a function from another package can be avoided using data.table's own mday() function (thanks to @Henrik):

setDT(qfile)[, lapply(.SD, sum), by = .(gageID, date = date - mday(date) + 1)] 

Data

Here, Abdou's sample data is used:

qfile <- structure(list(gageID = c(1011000L, 1011000L, 1011000L, 1011000L, 
1011000L), date = structure(c(-12004, -12003, -12002, -11975, 
-11994), class = "Date"), flow_cms = c(25.76833, 24.91883, 24.06932, 
12.222, 589.222)), .Names = c("gageID", "date", "flow_cms"), row.names = c(NA, 
-5L), class = "data.frame")


来源:https://stackoverflow.com/questions/33109673/how-to-sum-data-by-month-and-store-them-in-the-first-day-of-the-month

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!