How to match dates in 2 data frames in R, then sum specific range of values up to that date?

六眼飞鱼酱① 提交于 2021-01-29 04:35:33

问题


I have two data frames: rainfall data collected daily and nitrate concentrations of water samples collected irregularly, approximately once a month. I would like to create a vector of values for each nitrate concentration that is the sum of the previous 5 days' rainfall. Basically, I need to match the nitrate date with the rain date, sum the previous 5 days' rainfall, then print the sum with the nitrate data.

I think I need to either make a function, a for loop, or use tapply to do this, but I don't know how. I'm not an expert at any of those, though I've used them in simple cases. I've searched for similar posts, but none get at this exactly. This one deals with summing by factor groups. This one deals with summing each possible pair of rows. This one deals with summing by aggregate.

Here are 2 example data frames:

# rainfall df
mm<- c(0,0,0,0,5, 0,0,2,0,0, 10,0,0,0,0)
date<- c(1:15)
rain <- data.frame(cbind(mm, date))
# b/c sums of rainfall depend on correct chronological order, make sure the data are in order by date.
rain[ do.call(order, list(rain$date)),] 

# nitrate df 
nconc <- c(15, 12, 14, 20, 8.5) # nitrate concentration
ndate<- c(6,8,11,13,14)
nitrate <- data.frame(cbind(nconc, ndate))

I would like to have a way of finding the matching rainfall date for each nitrate measurement, such as:

match(nitrate$date[i] %in% rain$date)

(Note: Will match work with as.Date dates?) And then sum the preceding 5 days' rainfall (not including the measurement date), such as:

sum(rain$mm[j-6:j-1]

And prints the sum in a new column in nitrate

print(nitrate$mm_sum[i])

To make sure it's clear what result I'm looking for, here's how to do the calculation 'by hand'. The first nitrate concentration was collected on day 6, so the sum of rainfall on days 1-5 is 5mm.

Many thanks in advance.


回答1:


You were more or less there!

nitrate$prev_five_rainfall = NA 
for (i in 1:length(nitrate$ndate)) { 
    day = nitrate$ndate[i] 
    nitrate$prev_five_rainfall[i] = sum(rain$mm[(day-6):(day-1)])
}

Step by step explanation:

Initialize empty result column:

nitrate$prev_five_rainfall = NA 

For each line in the nitrate df: (i = 1,2,3,4,5)

for (i in 1:length(nitrate$ndate)) { 

Grab the day we want final result for:

    day = nitrate$ndate[i] 

Take the rainfull sum and it put in in the results column

    nitrate$prev_five_rainfall[i] = sum(rain$mm[(day-6):(day-1)])

Close the for loop :)

}

Disclaimer: This answer is basic in that:

  • It will break if nitrate's ndate < 6
  • It will be incorrect if some dates are missing in the rain dataframe
  • It will be slow on larger data

As you get more experience with R, you might use data manipulation packages like dplyr or data.table for these types of manipulations.




回答2:


@nelsonauner's answer does all the heavy lifting. But one thing to note, in my actual data my dates are not numerical like they are in the example above, they are dates listed as MM/DD/YYYY with the appropriate as.Date(nitrate$date, "%m/%d/%Y").

I found that the for loop above gave me all zeros for nitrate$prev_five_rainfall and I suspected it was a problem with the dates.

So I changed my dates in both data sets to numerical using the difference in number of days between a common start date and the recorded date, so that the for loop would look for a matching number of days in each data frame rather than a date. First, make a column of the start date using rep_len() and format it:

nitrate$startdate <- rep_len("01/01/1980", nrow(nitrate)) 
nitrate$startdate <- as.Date(all$startdate, "%m/%d/%Y")

Then, calculate the difference using difftime():

nitrate$diffdays <- as.numeric(difftime(nitrate$date, nitrate$startdate, units="days"))

Do the same for the rain data frame. Finally, the for loop looks like this:

nitrate$prev_five_rainfall = NA 
for (i in 1:length(nitrate$diffdays)) { 
    day = nitrate$diffdays[i] 
    nitrate$prev_five_rainfall[i] = sum(rain$mm[(day-5):(day-1)])  # 5 days
}


来源:https://stackoverflow.com/questions/40467987/how-to-match-dates-in-2-data-frames-in-r-then-sum-specific-range-of-values-up-t

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!