R ts with missing values

前端未结

关注

 3  500

I have a data frame I read from a csv file that has daily observations:

Date        Value 
2010-01-04  23.4
2010-01-05  12.7
2010-01-04  20.1
2010-01-07  18


                      
              相关标签:


      
      
        
          3条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  心在旅途        
                
              
                            
                2020-12-10 08:12
              
            
            
                                                                       
You can use the imputeTS, zoo or forecast package, which all offer methods to fill the missing data. 
(the process of filling missing gaps is also called imputation)

imputeTS

na.interpolation(yourData)
na.seadec(yourdata)
na.kalman(yourdata)
na.ma(yourdata)


zoo

na.approx(yourdata)
na.locf(yourdata)
na.StructTS(yourdata)


forecast

na.interp(yourdata)


These are some functions from the packages you could use.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  梦如初夏        
                
              
                            
                2020-12-10 08:13
              
            
            
                                                                       
You'll probably need to aggregate, yes - the important thing is to be smart about doing so. If you simply aggregate to week-level, using something like lubridate to map timestamps to weeks, you'll certainly end up with something that Forecast can consume - but it will be something with deceptive data, since some weeks will have smaller counts because they're missing days. This makes the dataset less useful for predictive modelling, because you're not giving it a model of what actually happened.

My recommendation would be to look at the zoo time series package for handling this; it has a lot of functions for working out the probable value of a missing/NA entry, based on the other data it's handed. Install it and run:

library(zoo)
ls(pattern = "^na", "package:zoo")


To get a list of functions you might find particularly relevant.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  自闭症患者        
                
              
                            
                2020-12-10 08:17
              
            
            
                                                                       
One option is to expand your date index to include the missing observations, and use na.approx from zoo to fill in the missing values via interpolation. 

allDates <- seq.Date(
  min(values$Date),
  max(values$Date),
  "day")
##
allValues <- merge(
  x=data.frame(Date=allDates),
  y=values,
  all.x=TRUE)
R> head(allValues,7)
        Date      Value
1 2010-01-05 -0.6041787
2 2010-01-06  0.2274668
3 2010-01-07 -1.2751761
4 2010-01-08 -0.8696818
5 2010-01-09         NA
6 2010-01-10         NA
7 2010-01-11 -0.3486378
##
zooValues <- zoo(allValues$Value,allValues$Date)
R> head(zooValues,7)
2010-01-05 2010-01-06 2010-01-07 2010-01-08 2010-01-09 2010-01-10 2010-01-11 
-0.6041787  0.2274668 -1.2751761 -0.8696818         NA         NA -0.3486378 
##
approxValues <- na.approx(zooValues)
R> head(approxValues,7)
2010-01-05 2010-01-06 2010-01-07 2010-01-08 2010-01-09 2010-01-10 2010-01-11 
-0.6041787  0.2274668 -1.2751761 -0.8696818 -0.6960005 -0.5223192 -0.3486378


Even with missing values, zooValues is still a legitimate zoo object, e.g. plot(zooValues) will work (with discontinuities at missing values), but if you plan on fitting some sort of model to the data, you will most likely be better off using na.approx to replace the missing values.

Data:

library(zoo)
library(lubridate)
##
t0 <- "2010-01-04"
Dates <- as.Date(ymd(t0))+1:120
weekDays <- Dates[!(weekdays(Dates) %in% c("Saturday","Sunday"))]
##
set.seed(123)
values <- data.frame(Date=weekDays,Value=rnorm(length(weekDays)))

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复