R - check if string contains dates within specific date range

后端未结

关注

 2  724

I have the following requirement in R script (to write a Expression function in Spotfire):

dateString <- \"04/30/2015 03/21/2015 06/28/2015 12/19/2015\"
s


                      
              相关标签:


      
      
        
          2条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  春和景丽        
                
              
                            
                2020-12-11 07:46
              
            
            
                                                                       
You can use convenient function between from either dplyr/data.table after converting to 'Date' class.  The 'dateString' is a single string, which we can split at the white space using strsplit or just by using scan.

library(lubridate)
library(data.table)
between(mdy(scan(text=dateString, what='', quiet=TRUE)), 
              mdy(startDate), mdy(endDate))


The above single line can be split into different steps for easier understanding.

#split the string to substring at whitespace.
v1 <- scan(text=dateString, what='', quiet=TRUE)
#convert to Date class
v2 <- mdy(v1)
#use between to get a logical index of the dates 
#that are between 'startDate' and 'endDate'
res <- between(v2, mdy(startDate), mdy(endDate))
res 
#[1]  TRUE FALSE  TRUE FALSE


Just for completeness, if we need 'Yes/No' values in place of 'TRUE/FALSE' we can use ifelse.  The ifelse part would be easier to understand.  If elements are 'TRUE', it gets replaced with 'Yes' or else it will be replaced by 'No'.

 ifelse(res, 'Yes', 'No')
 #[1] "Yes" "No"  "Yes" "No" 


Or numeric indexing to replace the values in 'res'.

 c('No', 'Yes')[res+1L]
 #[1] "Yes" "No"  "Yes" "No" 


The above step may be a little confusing.  But, whenever I find something less obvious, I split the code into the smallest possible code.  Here, I would look for

 res+1L
 #[1] 2 1 2 1


adding/multiplying a logical index coerces the logical index to binary integers i.e. 0/1.  Here we added 1L or integer 1.  What happens is that the TRUE values coerced to 1 will be added with the 1L to get 2 and FALSE coerced to 0 will be added with 1 and 0+1 = 1.

As the logical index is converted to numeric index, we use this to replace a vector of strings c('No', 'Yes').  Note that in the first position of the string is 'No' and in the second position it is 'Yes'.  Based on the length of the numeric index i.e. '4' and the position index specified by that index, we replace the index with 'Yes/No'. 



We could also this without using any external package as well.

 v2 <- as.Date(v1, '%m/%d/%Y')
 v2 >= as.Date(startDate, '%m/%d/%Y') & v2 <= as.Date(endDate, '%m/%d/%Y')
 #[1]  TRUE FALSE  TRUE FALSE


If we don't need to consider the 'startDate' and 'endDate', replace >=/<= with >/<
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  再見小時候        
                
              
                            
                2020-12-11 07:52
              
            
            
                                                                       
Here's an alternative solution without additional packages.

First, represent strings as dates:

dates <- lapply(strsplit(dateString, " +")[[1L]], as.Date, "%m/%d/%Y")
start <- as.Date(startDate, "%m/%d/%Y")
end <- as.Date(endDate, "%m/%d/%Y")


Second, check whether the dates are between start and end:

sapply(dates, function(x) x >= start && x <= end)
# [1]  TRUE FALSE  TRUE FALSE

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复