问题
I have a data frame as below with 5000+ rows. I am trying to insert a row where the month is missing e.g. month 6 below - and then utilise linear interpolation to calculate the 'TWS' value. Ideally the Decimal Date would be filled appropriately too but I can sort this afterwards if not! The data frame is months 1:12 for 10 years (2003-2012) but this repeats for multiple grid squares.
I have found lots other similar questions but not relating to a repeating 1:12 monthly sequence.
> head(ts.data,20)
GridNo GridIndex Lon Lat DecimDate Year Month TWS
1 GR72 72 35.5 -4.5 2003.000 2003 01 14.2566781
2 GR72 72 35.5 -4.5 2003.083 2003 02 5.0413706
3 GR72 72 35.5 -4.5 2003.167 2003 03 3.8192721
4 GR72 72 35.5 -4.5 2003.250 2003 04 5.8706026
5 GR72 72 35.5 -4.5 2003.333 2003 05 7.8461188
6 GR72 72 35.5 -4.5 2003.500 2003 07 2.3821844
7 GR72 72 35.5 -4.5 2003.583 2003 08 0.1995629
8 GR72 72 35.5 -4.5 2003.667 2003 09 -1.8353604
9 GR72 72 35.5 -4.5 2003.750 2003 10 -2.0410653
10 GR72 72 35.5 -4.5 2003.833 2003 11 -1.4029813
11 GR72 72 35.5 -4.5 2003.917 2003 12 -0.2206872
12 GR72 72 35.5 -4.5 2004.000 2004 01 -0.5090872
13 GR72 72 35.5 -4.5 2004.083 2004 02 -0.4887118
14 GR72 72 35.5 -4.5 2004.167 2004 03 -0.7725966
15 GR72 72 35.5 -4.5 2004.250 2004 04 4.1831581
16 GR72 72 35.5 -4.5 2004.333 2004 05 2.5651040
17 GR72 72 35.5 -4.5 2004.417 2004 06 -2.2511409
18 GR72 72 35.5 -4.5 2004.500 2004 07 -1.6484375
19 GR72 72 35.5 -4.5 2004.583 2004 08 -4.6508982
20 GR72 72 35.5 -4.5 2004.667 2004 09 -5.0053745
Any help appreciated!
回答1:
Using data.table and zoo packages you can easily expand your data set and interpolate as long as you don't have NAs at both sizes of the year
Expend the data set
library(data.table)
library(zoo)
res <- setDT(df)[, .SD[match(1:12, Month)], by = Year]
Interpolate on whatever column you want
cols <- c("Month", "DecimDate", "TWS")
res[, (cols) := lapply(.SD, na.approx, na.rm = FALSE), .SDcols = cols]
res
# Year GridNo GridIndex Lon Lat DecimDate Month TWS
# 1: 2003 GR72 72 35.5 -4.5 2003.000 1 14.2566781
# 2: 2003 GR72 72 35.5 -4.5 2003.083 2 5.0413706
# 3: 2003 GR72 72 35.5 -4.5 2003.167 3 3.8192721
# 4: 2003 GR72 72 35.5 -4.5 2003.250 4 5.8706026
# 5: 2003 GR72 72 35.5 -4.5 2003.333 5 7.8461188
# 6: 2003 NA NA NA NA 2003.417 6 5.1141516
# 7: 2003 GR72 72 35.5 -4.5 2003.500 7 2.3821844
# 8: 2003 GR72 72 35.5 -4.5 2003.583 8 0.1995629
# 9: 2003 GR72 72 35.5 -4.5 2003.667 9 -1.8353604
# 10: 2003 GR72 72 35.5 -4.5 2003.750 10 -2.0410653
# 11: 2003 GR72 72 35.5 -4.5 2003.833 11 -1.4029813
# 12: 2003 GR72 72 35.5 -4.5 2003.917 12 -0.2206872
# 13: 2004 GR72 72 35.5 -4.5 2004.000 1 -0.5090872
# 14: 2004 GR72 72 35.5 -4.5 2004.083 2 -0.4887118
# 15: 2004 GR72 72 35.5 -4.5 2004.167 3 -0.7725966
# 16: 2004 GR72 72 35.5 -4.5 2004.250 4 4.1831581
# 17: 2004 GR72 72 35.5 -4.5 2004.333 5 2.5651040
# 18: 2004 GR72 72 35.5 -4.5 2004.417 6 -2.2511409
# 19: 2004 GR72 72 35.5 -4.5 2004.500 7 -1.6484375
# 20: 2004 GR72 72 35.5 -4.5 2004.583 8 -4.6508982
# 21: 2004 GR72 72 35.5 -4.5 2004.667 9 -5.0053745
# 22: 2004 NA NA NA NA NA NA NA
# 23: 2004 NA NA NA NA NA NA NA
# 24: 2004 NA NA NA NA NA NA NA
回答2:
I would simply first transform your dates into actual Dates (here taking the first of every month:
dates <- as.Date(paste(ts.data$Year, ts.data$Month, 1, sep="-"))
Do the same for the target, missing months (here just one but can work with many):
target <- as.Date("2003-06-01")
And do the approximation:
approx(dates, ts.data$TWS, target)
$x
[1] "2003-06-01"
$y
[1] 5.069365
So in the context of your dataframe (here simplified):
ts.data <- data.frame(Year=c(rep(2003,11),rep(2004,9)),Month=c((1:12)[-6],1:9),TWS=c(14.2566781,5.0413706,3.8192721,5.8706026,7.8461188, 2.3821844, 0.1995629,-1.8353604,-2.0410653,-1.4029813,-0.2206872,-0.5090872,-0.4887118,-0.7725966, 4.1831581, 2.5651040,-2.2511409,-1.6484375,-4.6508982, -5.0053745))
dates <- as.Date(paste(ts.data$Year, ts.data$Month, 1, sep="-"))
target <- as.Date("2003-06-01")
ts.data <- rbind(ts.data,
data.frame(Year=2003,
Month=6,
TWS=approx(dates, ts.data$TWS, target)$y)
ts.data <- ts.data[order(ts.data$Year, ts.data$Month),]
来源:https://stackoverflow.com/questions/31383601/r-insert-row-for-missing-monthly-data-and-interpolate