I have a time-series data as shown below.
2015-04-26 23:00:00 5704.27388916015661380
2015-04-27 00:00:00 4470.30868326822928793
2015-04-27 01:00:00 4552.57241617838553793
2015-04-27 02:00:00 4570.22250032825650123
2015-04-27 03:00:00 NA
2015-04-27 04:00:00 NA
2015-04-27 05:00:00 NA
2015-04-27 06:00:00 12697.37724086216439900
2015-04-27 07:00:00 5538.71119009653739340
2015-04-27 08:00:00 81.95060647328695325
2015-04-27 09:00:00 8550.65816895300667966
2015-04-27 10:00:00 2925.76573206583680076
How should I handle Continous NA values. In cases where I have only one NA, I use to take the average of extreme values of NA entry. Are there any standard approaches to deal with continuous missing values?
The zoo
package has several functions for dealing with NA
values. One of the following functions might suit your needs:
na.locf
: Last observation carried forward. Using the parameterfromLast = TRUE
corresponds to next observation carried backward (NOCB).na.aggregate
: Replace theNA
's with some aggregated value. The default aggregation function is themean
, but you can specify other functions as well. See?na.aggregate
for more info.na.approx
:NA
's are replaced with linear interpolated values.
You can compare the outcomes to see what these functions do:
library(zoo)
df$V.loc <- na.locf(df$V2)
df$V.agg <- na.aggregate(df$V2)
df$V.app <- na.approx(df$V2)
this results in:
> df
V1 V2 V.loc V.agg V.app
1 2015-04-26 23:00:00 5704.27389 5704.27389 5704.27389 5704.27389
2 2015-04-27 00:00:00 4470.30868 4470.30868 4470.30868 4470.30868
3 2015-04-27 01:00:00 4552.57242 4552.57242 4552.57242 4552.57242
4 2015-04-27 02:00:00 4570.22250 4570.22250 4570.22250 4570.22250
5 2015-04-27 03:00:00 NA 4570.22250 5454.64894 6602.01119
6 2015-04-27 04:00:00 NA 4570.22250 5454.64894 8633.79987
7 2015-04-27 05:00:00 NA 4570.22250 5454.64894 10665.58856
8 2015-04-27 06:00:00 12697.37724 12697.37724 12697.37724 12697.37724
9 2015-04-27 07:00:00 5538.71119 5538.71119 5538.71119 5538.71119
10 2015-04-27 08:00:00 81.95061 81.95061 81.95061 81.95061
11 2015-04-27 09:00:00 8550.65817 8550.65817 8550.65817 8550.65817
12 2015-04-27 10:00:00 2925.76573 2925.76573 2925.76573 2925.76573
Used data:
df <- structure(list(V1 = structure(c(1430082000, 1430085600, 1430089200, 1430092800, 1430096400, 1430100000, 1430103600, 1430107200, 1430110800, 1430114400, 1430118000, 1430121600), class = c("POSIXct", "POSIXt"), tzone = ""), V2 = c(5704.27388916016, 4470.30868326823, 4552.57241617839, 4570.22250032826, NA, NA, NA, 12697.3772408622, 5538.71119009654, 81.950606473287, 8550.65816895301, 2925.76573206584)), .Names = c("V1", "V2"), row.names = c(NA, -12L), class = "data.frame")
Addition:
There are also additional time series functions for dealing with NAs in the imputeTS
and the forecast
package (also some more advanced functions).
For example:
library("imputeTS")
# Moving Average Imputation
na.ma(df$V2)
# Imputation via Kalman Smoothing on structural time series models
na.kalman(df$V2)
# Just interpolation but with some nice options (linear, spline,stine)
na.interpolation(df$V2)
or
library("forecast")
#Interpolation via seasonal decomposition and interpolation
na.interp(df$V2)
来源:https://stackoverflow.com/questions/32694313/handle-continous-missing-values-in-time-series-data