How to Determine Values for Missing Months based on Data of Previous Months in T-SQL

前端 未结 7 1674
庸人自扰
庸人自扰 2020-12-09 14:24

I have a set of transactions occurring at specific points in time:

CREATE TABLE Transactions (
    TransactionDate Date NOT NULL,
    TransactionValue Intege         


        
相关标签:
7条回答
  • 2020-12-09 14:38

    John Gibb posted a fine answer, already accepted, but I wanted to expand on it a bit to:

    • eliminate the one year limitation,
    • expose the date range in a more explicit manner, and
    • eliminate the need for a separate numbers table.

    This slight variation uses a recursive common table expression to establish the set of Dates representing the first of each month on or after from and to dates defined in DateRange. Note the use of the MAXRECURSION option to prevent a stack overflow (!); adjust as necessary to accommodate the maximum number of months expected. Also, consider adding alternative Dates assembly logic to support weeks, quarters, even day-to-day.

    with 
    DateRange(FromDate, ToDate) as (
      select 
        Cast('11/1/2008' as DateTime), 
        Cast('2/15/2010' as DateTime)
    ),
    Dates(Date) as (
      select 
        Case Day(FromDate) 
          When 1 Then FromDate
          Else DateAdd(month, 1, DateAdd(month, ((Year(FromDate)-1900)*12)+Month(FromDate)-1, 0))
        End
      from DateRange
      union all
      select DateAdd(month, 1, Date)
      from Dates
      where Date < (select ToDate from DateRange)
    )
    select 
      d.Date, t.TransactionValue
    from Dates d
    outer apply (
      select top 1 TransactionValue
      from Transactions
      where TransactionDate <= d.Date
      order by TransactionDate desc
    ) t
    option (maxrecursion 120);
    
    0 讨论(0)
  • 2020-12-09 14:39

    To do it in a set-based way, you need sets for all of your data or information. In this case there's the overlooked data of "What months are there?" It's very useful to have a "Calendar" table as well as a "Number" table in databases as utility tables.

    Here's a solution using one of these methods. The first bit of code sets up your calendar table. You can fill it using a cursor or manually or whatever and you can limit it to whatever date range is needed for your business (back to 1900-01-01 or just back to 1970-01-01 and as far into the future as you want). You can also add any other columns that are useful for your business.

    CREATE TABLE dbo.Calendar
    (
         date           DATETIME     NOT NULL,
         is_holiday     BIT          NOT NULL,
         CONSTRAINT PK_Calendar PRIMARY KEY CLUSTERED (date)
    )
    
    INSERT INTO dbo.Calendar (date, is_holiday) VALUES ('2009-01-01', 1)  -- New Year
    INSERT INTO dbo.Calendar (date, is_holiday) VALUES ('2009-01-02', 1)
    ...
    

    Now, using this table your question becomes trivial:

    SELECT
         CAST(MONTH(date) AS VARCHAR) + '/' + CAST(YEAR(date) AS VARCHAR) AS [Month],
         T1.TransactionValue AS [Value]
    FROM
         dbo.Calendar C
    LEFT OUTER JOIN dbo.Transactions T1 ON
         T1.TransactionDate <= C.date
    LEFT OUTER JOIN dbo.Transactions T2 ON
         T2.TransactionDate > T1.TransactionDate AND
         T2.TransactionDate <= C.date
    WHERE
         DAY(C.date) = 1 AND
         T2.TransactionDate IS NULL AND
         C.date BETWEEN '2009-01-01' AND '2009-12-31'  -- You can use whatever range you want
    
    0 讨论(0)
  • 2020-12-09 14:40

    -----Alternative way------

    select 
        d.firstOfMonth,
        MONTH(d.firstOfMonth) as Mon,
        YEAR(d.firstOfMonth) as Yr, 
        t.TransactionValue
    from (
        select 
            dateadd( month, inMonths - 1, '1/1/2009') as firstOfMonth 
            from (
                values (1), (2), (3), (4), (5), (7), (8), (9), (10), (11), (12)
            ) Dates(inMonths)
    ) d
    outer apply (
        select top 1 TransactionValue
        from Transactions
        where TransactionDate <= d.firstOfMonth
        order by TransactionDate desc
    ) t
    
    0 讨论(0)
  • 2020-12-09 14:45

    I'd start by building a Numbers table holding sequential integers from 1 to a million or so. They come in really handy once you get the hang of it.

    For example, here is how to get the 1st of every month in 2008:

    select firstOfMonth = dateadd( month, n - 1, '1/1/2008')
    from Numbers
    where n <= 12;
    

    Now, you can put that together using OUTER APPLY to find the most recent transaction for each date like so:

    with Dates as (
        select firstOfMonth = dateadd( month, n - 1, '1/1/2008')
        from Numbers
        where n <= 12
    )
    select d.firstOfMonth, t.TransactionValue
    from Dates d
    outer apply (
        select top 1 TransactionValue
        from Transactions
        where TransactionDate <= d.firstOfMonth
        order by TransactionDate desc
    ) t;
    

    This should give you what you're looking for, but you might have to Google around a little to find the best way to create the Numbers table.

    0 讨论(0)
  • 2020-12-09 14:45

    If you do this type of analysis often, you might be interested in this SQL Server function I put together for exactly this purpose:

    if exists (select * from dbo.sysobjects where name = 'fn_daterange') drop function fn_daterange;
    go
    
    create function fn_daterange
       (
       @MinDate as datetime,
       @MaxDate as datetime,
       @intval  as datetime
       )
    returns table
    --**************************************************************************
    -- Procedure: fn_daterange()
    --    Author: Ron Savage
    --      Date: 12/16/2008
    --
    -- Description:
    -- This function takes a starting and ending date and an interval, then
    -- returns a table of all the dates in that range at the specified interval.
    --
    -- Change History:
    -- Date        Init. Description
    -- 12/16/2008  RS    Created.
    -- **************************************************************************
    as
    return
       WITH times (startdate, enddate, intervl) AS
          (
          SELECT @MinDate as startdate, @MinDate + @intval - .0000001 as enddate, @intval as intervl
             UNION ALL
          SELECT startdate + intervl as startdate, enddate + intervl as enddate, intervl as intervl
          FROM times
          WHERE startdate + intervl <= @MaxDate
          )
       select startdate, enddate from times;
    
    go
    

    it was an answer to this question, which also has some sample output from it.

    0 讨论(0)
  • 2020-12-09 14:54

    I don't have access to BOL from my phone so this is a rough guide...

    First, you need to generate the missing rows for the months you have no data. You can either use a OUTER join to a fixed table or temp table with the timespan you want or from a programmatically created dataset (stored proc or suchlike)

    Second, you should look at the new SQL 2008 'analytic' functions, like MAX(value) OVER ( partition clause ) to get the previous value.

    (I KNOW Oracle can do this 'cause I needed it to calculate compounded interest calcs between transaction dates - same problem really)

    Hope this points you in the right direction...

    (Avoid throwing it into a temp table and cursoring over it. Too crude!!!)

    0 讨论(0)
提交回复
热议问题