Vertica date series is starting one month before specified date

删除回忆录丶 提交于 2019-12-13 05:21:18

问题


I work with a Vertica database and I needed to make a query that, given two dates, would give me a list of all months between said dates. For example, if I were to give the query 2015-01-01 and 2015-12-31, it would output me the following list:

2015-01-01
2015-02-01
2015-03-01
2015-04-01
2015-05-01
2015-06-01
2015-07-01
2015-08-01
2015-09-01
2015-10-01
2015-11-01
2015-12-01

After a bit of digging, I was able to discover the following query:

SELECT date_trunc('MONTH', ts)::date as Mois
FROM 
(
    SELECT '2015-01-01'::TIMESTAMP as tm
    UNION
    SELECT '2015-12-31'::TIMESTAMP as tm
) as t
TIMESERIES ts as '1 month' OVER (ORDER BY tm)

This query works and gives me the following output:

2014-12-01
2015-01-01
2015-02-01
2015-03-01
2015-04-01
2015-05-01
2015-06-01
2015-07-01
2015-08-01
2015-09-01
2015-10-01
2015-11-01
2015-12-01

As you can see, by giving the query a starting date of '2015-01-01' or anywhere in january for that matters, I end up with an extra entry, namely 2014-12-01. In itself, the bug (or whatever you want to call this unexpected behavior) is easy to circumvent (just start in february), but I have to admit my curiosity's piked. Why exactly is the serie starting one month BEFORE the date I specified?

EDIT: Alright, after reading Kimbo's warning and confirming that indeed, long periods will eventually cause problems, I was able to come up with the following query that readjusts the dates correctly.

SELECT ts as originalMonth, 
ts + 
    (
        mod
        (
            day(first_value(ts) over (order by ts)) - day(ts) + day(last_day(ts)), 
            day(last_day(ts))
        )
    ) as adjustedMonth
FROM 
(
    SELECT ts
    FROM 
    (
        SELECT '2015-01-01'::TIMESTAMP as tm
        UNION
        SELECT '2018-12-31'::TIMESTAMP as tm
    ) as t
    TIMESERIES ts as '1 month' OVER (ORDER BY tm)
) as temp

The only problem I have is that I have no control over the initial day of the first record of the series. It's set automatically by Vertica to the current day. So if I run this query on the 31st of the month, I wonder how it'll behave. I guess I'll just have to wait for december to see unless someone knows how to get timeseries to behave in a way that would allow me to test it.

EDIT: Okay, so after trying out many different date combinations, I was able to determine that the day which the series starts changes depending on the date you specify. This caused a whole lot of problems... until we decided to go the simple way. Instead of using a month interval, we used a day interval and only selected one specific day per month. WAY simpler and it works all the time. Here's the final query:

SELECT ts as originalMonth
FROM 
(
    SELECT ts
    FROM 
    (
        SELECT '2000-02-01'::TIMESTAMP as tm
        UNION
        SELECT '2018-12-31'::TIMESTAMP as tm
    ) as t
    TIMESERIES ts as '1 day' OVER (ORDER BY tm)
) as temp
where day(ts) = 1

回答1:


I think it boils down to this statement from the doc: http://my.vertica.com/docs/7.1.x/HTML/index.htm#Authoring/SQLReferenceManual/Statements/SELECT/TIMESERIESClause.htm

TIME_SLICE can return the start or end time of a time slice, depending on the value of its fourth input parameter (start_or_end). TIMESERIES, on the other hand, always returns the start time of each time slice.

When you define a time interval with some start date (2015-01-01, for example), then TIMESERIES ts AS '1 month' will create for its first time slice a slice that starts 1 month ahead of that first data point, so 2014-12-01. When you do DATE_TRUNC('MON', ts), that of course sets the first date value to 2014-12-01 even if your start date is 2015-01-03, or whatever.

e: I want to throw out one more warning -- your use of DATE_TRUNC achieves what you need, I think. But, from the doc: Unlike TIME_SLICE, the time slice length and time unit expressed in [TIMESERIES] length_and_time_unit_expr must be constants so gaps in the time slices are well-defined. This means that '1 month' is actually 30 days exactly. This obviously has problems if you're going for more than a couple years.



来源:https://stackoverflow.com/questions/33673887/vertica-date-series-is-starting-one-month-before-specified-date

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!