filter by date range in sqldf

微笑、不失礼 提交于 2021-01-28 12:22:10

问题


I'm trying to filter a dataframe based on a date range using sqldf, like the example code below. I have data like the example data below. The datedf dataframe that gets returned by sqldf has no records. There are records in the SHV dataframe in that date range, can anyone see what I'm doing wrong and let me know how to filter by date range in sqldf. Dates are always tricky in r for me.

Code:
datedf<-sqldf("select field1                            
            ,fieldDate
                            from SHV
                            where fieldDate between '2004-01-01' and '2005-01-01'
                            ")


Data:

dput(SHV[1:50,c("field1","fieldDate")])
structure(list(field1 = c(1378L, 1653L, 1882L, 2400L, 
2305L, 2051L, 2051L, 2051L, 1796L, 2054L, 2568L, 1290L, 1804L, 
1804L, 3855L, 1297L, 2321L, 2321L, 2321L, 2071L, 2071L, 2074L, 
2588L, 1567L, 1317L, 1317L, 808L, 808L, 1321L, 2350L, 1586L, 
2613L, 1590L, 2614L, 2107L, 1340L, 1085L, 1085L, 2365L, 1344L, 
1601L, 1858L, 1603L, 1603L, 1860L, 2376L, 1355L, 1867L, 2382L, 
1872L), fieldDate = structure(c(12551, NA, NA, 14057, 15337, 
12919, 13336, 10325, 14984, 15643, 12864, 11242, 10749, 11207, 
10602, NA, 12646, 15649, NA, NA, NA, NA, NA, 17015, 13938, NA, 
16693, NA, NA, 12634, 12614, 10689, 12755, 10844, 11375, 4899, 
17298, 10905, 11450, NA, 10330, 15429, 12634, 10504, 12625, 11081, 
10939, NA, 12934, 11176), class = "Date")), .Names = c("field1", 
"fieldDate"), row.names = c(NA, 50L), class = "data.frame")

回答1:


In this sample of data you have no records in that date range:

SHV[SHV$fieldDate >= "2010-01-01" & SHV$fieldDate < "2011-01-01",]
  field1 fieldDate
NA        NA      <NA>
NA.1      NA      <NA>
NA.2      NA      <NA>
NA.3      NA      <NA>
NA.4      NA      <NA>
NA.5      NA      <NA>
NA.6      NA      <NA>
NA.7      NA      <NA>
NA.8      NA      <NA>
NA.9      NA      <NA>
NA.10     NA      <NA>
NA.11     NA      <NA>
NA.12     NA      <NA>



回答2:


Per the sqldf() documentation, one needs to format the dates into their numeric values in order to have them processed as dates. This can be accomplished with sprintf() when generating the SQL query.

SHV <- structure(list(field1 = c(1378L, 1653L, 1882L, 2400L, 
                          2305L, 2051L, 2051L, 2051L, 1796L, 2054L, 2568L, 1290L, 1804L, 
                          1804L, 3855L, 1297L, 2321L, 2321L, 2321L, 2071L, 2071L, 2074L, 
                          2588L, 1567L, 1317L, 1317L, 808L, 808L, 1321L, 2350L, 1586L, 
                          2613L, 1590L, 2614L, 2107L, 1340L, 1085L, 1085L, 2365L, 1344L, 
                          1601L, 1858L, 1603L, 1603L, 1860L, 2376L, 1355L, 1867L, 2382L, 
                          1872L), fieldDate = structure(c(12551, NA, NA, 14057, 15337, 
                                                              12919, 13336, 10325, 14984, 15643, 12864, 11242, 10749, 11207, 
                                                              10602, NA, 12646, 15649, NA, NA, NA, NA, NA, 17015, 13938, NA, 
                                                          16693, NA, NA, 12634, 12614, 10689, 12755, 10844, 11375, 4899, 
                                                          17298, 10905, 11450, NA, 10330, 15429, 12634, 10504, 12625, 11081, 
                                                          10939, NA, 12934, 11176), class = "Date")), .Names = c("field1", 
                                                                                                                 "fieldDate"), row.names = c(NA, 50L), class = "data.frame")

library(sqldf)
sqlStmt <- paste("select field1, fieldDate from SHV",
                 "where fieldDate between ",
                 sprintf("%d and %d",as.Date('2004-01-01','%Y-%m-%d'),
                     as.Date('2005-01-01','%Y-%m-%d')))
datedf<-sqldf(sqlStmt)
datedf

> datedf
  field1  fieldDate
1   1378 2004-05-13
2   2321 2004-08-16
3   2350 2004-08-04
4   1586 2004-07-15
5   1590 2004-12-03
6   1603 2004-08-04
7   1860 2004-07-26
> 

The sprintf() statement converts the dates to a numeric value, which ensures that the between operator in SQL works correctly.

> sqlStmt
[1] "select field1, fieldDate from SHV where fieldDate between  12418 and 12784"
>



回答3:


According to this article, the date field should be converted to a character before executing sqldf.

Before passing any dates to SQLdf we need to first convert them to strings. Otherwise, SQLdf will try to treat them as numbers–which will cause a lot of heartache.

...

Instead, we should convert the DateCreated column to a string instead of a date. Then, SQL will actually convert it from a string to a date.

Confused? Imagine me when I was trying to figure this out on my own.

So your code could be:

SHV$fieldDate <- as.character(SHV$fieldDate)

datedf <- sqldf("
  SELECT
    field1,
    fieldDate
  FROM SHV
  WHERE fieldDate between '2004-01-01' and '2005-01-01'
  --WHERE '2004-01-01' <= fieldDate --and fieldDate <= '2005-01-01'
  ORDER BY fieldDate
")

# Both should equal 7.  Verify that null rows are handled as desired.
nrow(datedf)
sum(as.Date('2004-01-01') <= SHV$fieldDate & SHV$fieldDate <= as.Date('2005-01-01'), na.rm=T)

I wish it had some more explanation about when it casts a date-looking variable into an actual date. If you're looking around more, this SO response by @g-grothendieck takes a different approach and equate the data types inside the sqldf query.



来源:https://stackoverflow.com/questions/47933142/filter-by-date-range-in-sqldf

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!