SQL join against date ranges?

后端 未结 6 1554
故里飘歌
故里飘歌 2020-12-05 00:57

Consider two tables:

Transactions, with amounts in a foreign currency:

     Date  Amount
========= =======
 1/2/2009    1500
 2/4/         


        
相关标签:
6条回答
  • 2020-12-05 01:13
    SELECT 
        a.tranDate, 
        a.Amount,
        a.Amount/a.Rate as convertedRate
    FROM
        (
    
        SELECT 
            t.date tranDate,
            e.date as rateDate,
            t.Amount,
            e.rate,
            RANK() OVER (Partition BY t.date ORDER BY
                             CASE WHEN DATEDIFF(day,e.date,t.date) < 0 THEN
                                       DATEDIFF(day,e.date,t.date) * -100000
                                  ELSE DATEDIFF(day,e.date,t.date)
                             END ) AS diff
        FROM 
            ExchangeRates e
        CROSS JOIN 
            Transactions t
             ) a
    WHERE a.diff = 1
    

    The difference between tran and rate date is calculated, then negative values ( condition b) are multiplied by -10000 so that they can still be ranked but positive values (condition a always take priority. we then select the minimum date difference for each tran date using the rank over clause.

    0 讨论(0)
  • 2020-12-05 01:20

    I can't test this, but I think it would work. It uses coalesce with two sub-queries to pick the rate by rule A or rule B.

    Select t.Date, t.Amount, 
      ConvertedAmount = t.Amount/coalesce(    
        (Select Top 1 ex.Rate 
            From ExchangeRates ex 
            Where t.Date > ex.Date 
            Order by ex.Date desc )
         ,
         (select top 1 ex.Rate 
            From ExchangeRates  
            Order by ex.Date asc)
        ) 
    From Transactions t
    
    0 讨论(0)
  • 2020-12-05 01:23

    You could first do a self-join on the exchange rates which are ordered by date so that you have the start and the end date of each exchange rate, without any overlap or gap in the dates (maybe add that as view to your database - in my case I'm just using a common table expression).

    Now joining those "prepared" rates with the transactions is simple and efficient.

    Something like:

    WITH IndexedExchangeRates AS (           
                SELECT  Row_Number() OVER (ORDER BY Date) ix,
                        Date,
                        Rate 
                FROM    ExchangeRates 
            ),
            RangedExchangeRates AS (             
                SELECT  CASE WHEN IER.ix=1 THEN CAST('1753-01-01' AS datetime) 
                        ELSE IER.Date 
                        END DateFrom,
                        COALESCE(IER2.Date, GETDATE()) DateTo,
                        IER.Rate 
                FROM    IndexedExchangeRates IER 
                LEFT JOIN IndexedExchangeRates IER2 
                ON IER.ix = IER2.ix-1 
            )
    SELECT  T.Date,
            T.Amount,
            RER.Rate,
            T.Amount/RER.Rate ConvertedAmount 
    FROM    Transactions T 
    LEFT JOIN RangedExchangeRates RER 
    ON (T.Date > RER.DateFrom) AND (T.Date <= RER.DateTo)
    

    Notes:

    • You could replace GETDATE() with a date in the far future, I'm assuming here that no rates for the future are known.

    • Rule (B) is implemented by setting the date of the first known exchange rate to the minimal date supported by the SQL Server datetime, which should (by definition if it is the type you're using for the Date column) be the smallest value possible.

    0 讨论(0)
  • 2020-12-05 01:25

    Suppose you had an extended exchange rate table that contained:

     Start Date   End Date    Rate
     ========== ========== =======
     0001-01-01 2009-01-31    40.1
     2009-02-01 2009-02-28    40.1
     2009-03-01 2009-03-31    41.0
     2009-04-01 2009-04-30    38.5
     2009-05-01 9999-12-31    42.7
    

    We can discuss the details of whether the first two rows should be combined, but the general idea is that it is trivial to find the exchange rate for a given date. This structure works with the SQL 'BETWEEN' operator which includes the ends of the ranges. Often, a better format for ranges is 'open-closed'; the first date listed is included and the second is excluded. Note that there is a constraint on the data rows - there are (a) no gaps in the coverage of the range of dates and (b) no overlaps in the coverage. Enforcing those constraints is not completely trivial (polite understatement - meiosis).

    Now the basic query is trivial, and Case B is no longer a special case:

    SELECT T.Date, T.Amount, X.Rate
      FROM Transactions AS T JOIN ExtendedExchangeRates AS X
           ON T.Date BETWEEN X.StartDate AND X.EndDate;
    

    The tricky part is creating the ExtendedExchangeRate table from the given ExchangeRate table on the fly. If it is an option, then revising the structure of the basic ExchangeRate table to match the ExtendedExchangeRate table would be a good idea; you resolve the messy stuff when the data is entered (once a month) instead of every time an exchange rate needs to be determined (many times a day).

    How to create the extended exchange rate table? If your system supports adding or subtracting 1 from a date value to obtain the next or previous day (and has a single row table called 'Dual'), then a variation on this will work (without using any OLAP functions):

    CREATE TABLE ExchangeRate
    (
        Date    DATE NOT NULL,
        Rate    DECIMAL(10,5) NOT NULL
    );
    INSERT INTO ExchangeRate VALUES('2009-02-01', 40.1);
    INSERT INTO ExchangeRate VALUES('2009-03-01', 41.0);
    INSERT INTO ExchangeRate VALUES('2009-04-01', 38.5);
    INSERT INTO ExchangeRate VALUES('2009-05-01', 42.7);
    

    First row:

    SELECT '0001-01-01' AS StartDate,
           (SELECT MIN(Date) - 1 FROM ExchangeRate) AS EndDate,
           (SELECT Rate FROM ExchangeRate
             WHERE Date = (SELECT MIN(Date) FROM ExchangeRate)) AS Rate
    FROM Dual;
    

    Result:

    0001-01-01  2009-01-31      40.10000
    

    Last row:

    SELECT (SELECT MAX(Date) FROM ExchangeRate) AS StartDate,
           '9999-12-31' AS EndDate,
           (SELECT Rate FROM ExchangeRate
             WHERE Date = (SELECT MAX(Date) FROM ExchangeRate)) AS Rate
    FROM Dual;
    

    Result:

    2009-05-01  9999-12-31      42.70000
    

    Middle rows:

    SELECT X1.Date     AS StartDate,
           X2.Date - 1 AS EndDate,
           X1.Rate     AS Rate
      FROM ExchangeRate AS X1 JOIN ExchangeRate AS X2
           ON X1.Date < X2.Date
     WHERE NOT EXISTS
           (SELECT *
              FROM ExchangeRate AS X3
             WHERE X3.Date > X1.Date AND X3.Date < X2.Date
            );
    

    Result:

    2009-02-01  2009-02-28      40.10000
    2009-03-01  2009-03-31      41.00000
    2009-04-01  2009-04-30      38.50000
    

    Note that the NOT EXISTS sub-query is rather crucial. Without it, the 'middle rows' result is:

    2009-02-01  2009-02-28      40.10000
    2009-02-01  2009-03-31      40.10000    # Unwanted
    2009-02-01  2009-04-30      40.10000    # Unwanted
    2009-03-01  2009-03-31      41.00000
    2009-03-01  2009-04-30      41.00000    # Unwanted
    2009-04-01  2009-04-30      38.50000
    

    The number of unwanted rows increases dramatically as the table increases in size (for N > 2 rows, there are (N-2) * (N - 3) / 2 unwanted rows, I believe).

    The result for ExtendedExchangeRate is the (disjoint) UNION of the three queries:

    SELECT DATE '0001-01-01' AS StartDate,
           (SELECT MIN(Date) - 1 FROM ExchangeRate) AS EndDate,
           (SELECT Rate FROM ExchangeRate
             WHERE Date = (SELECT MIN(Date) FROM ExchangeRate)) AS Rate
    FROM Dual
    UNION
    SELECT X1.Date     AS StartDate,
           X2.Date - 1 AS EndDate,
           X1.Rate     AS Rate
      FROM ExchangeRate AS X1 JOIN ExchangeRate AS X2
           ON X1.Date < X2.Date
     WHERE NOT EXISTS
           (SELECT *
              FROM ExchangeRate AS X3
             WHERE X3.Date > X1.Date AND X3.Date < X2.Date
            )
    UNION
    SELECT (SELECT MAX(Date) FROM ExchangeRate) AS StartDate,
           DATE '9999-12-31' AS EndDate,
           (SELECT Rate FROM ExchangeRate
             WHERE Date = (SELECT MAX(Date) FROM ExchangeRate)) AS Rate
    FROM Dual;
    

    On the test DBMS (IBM Informix Dynamic Server 11.50.FC6 on MacOS X 10.6.2), I was able to convert the query into a view but I had to stop cheating with the data types - by coercing the strings into dates:

    CREATE VIEW ExtendedExchangeRate(StartDate, EndDate, Rate) AS
        SELECT DATE('0001-01-01')  AS StartDate,
               (SELECT MIN(Date) - 1 FROM ExchangeRate) AS EndDate,
               (SELECT Rate FROM ExchangeRate WHERE Date = (SELECT MIN(Date) FROM ExchangeRate)) AS Rate
        FROM Dual
        UNION
        SELECT X1.Date     AS StartDate,
               X2.Date - 1 AS EndDate,
               X1.Rate     AS Rate
          FROM ExchangeRate AS X1 JOIN ExchangeRate AS X2
               ON X1.Date < X2.Date
         WHERE NOT EXISTS
               (SELECT *
                  FROM ExchangeRate AS X3
                 WHERE X3.Date > X1.Date AND X3.Date < X2.Date
                )
        UNION 
        SELECT (SELECT MAX(Date) FROM ExchangeRate) AS StartDate,
               DATE('9999-12-31') AS EndDate,
               (SELECT Rate FROM ExchangeRate WHERE Date = (SELECT MAX(Date) FROM ExchangeRate)) AS Rate
        FROM Dual;
    
    0 讨论(0)
  • 2020-12-05 01:33

    Many solutions will work. You should really find the one that works best (fastest) for your workload: do you search usually for one Transaction, list of them, all of them?

    The tie-breaker solution given your schema is:

    SELECT      t.Date,
                t.Amount,
                r.Rate
                --//add your multiplication/division here
    
    FROM        "Transactions" t
    
    INNER JOIN  "ExchangeRates" r
            ON  r."ExchangeRateID" = (
                            SELECT TOP 1 x."ExchangeRateID"
                            FROM        "ExchangeRates" x
                            WHERE       x."SourceCurrencyISO" = t."SourceCurrencyISO" --//these are currency-related filters for your tables
                                    AND x."TargetCurrencyISO" = t."TargetCurrencyISO" --//,which you should also JOIN on
                                    AND x."Date" <= t."Date"
                            ORDER BY    x."Date" DESC)
    

    You need to have the right indices for this query to be fast. Also ideally you should not have a JOIN on "Date", but on "ID"-like field (INTEGER). Give me more schema info, I will create an example for you.

    0 讨论(0)
  • 2020-12-05 01:35

    There's nothing about a join that will be more elegant than the TOP 1 correlated subquery in your original post. However, as you say, it doesn't satisfy requirement B.

    These queries do work (SQL Server 2005 or later required). See the SqlFiddle for these.

    SELECT
       T.*,
       ExchangeRate = E.Rate
    FROM
      dbo.Transactions T
      CROSS APPLY (
        SELECT TOP 1 Rate
        FROM dbo.ExchangeRate E
        WHERE E.RateDate <= T.TranDate
        ORDER BY
          CASE WHEN E.RateDate <= T.TranDate THEN 0 ELSE 1 END,
          E.RateDate DESC
      ) E;
    

    Note that the CROSS APPLY with a single column value is functionally equivalent to the correlated subquery in the SELECT clause as you showed. I just prefer CROSS APPLY now because it is much more flexible and lets you reuse the value in multiple places, have multiple rows in it (for custom unpivoting) and lets you have multiple columns.

    SELECT
       T.*,
       ExchangeRate = Coalesce(E.Rate, E2.Rate)
    FROM
      dbo.Transactions T
      OUTER APPLY (
        SELECT TOP 1 Rate
        FROM dbo.ExchangeRate E
        WHERE E.RateDate <= T.TranDate
        ORDER BY E.RateDate DESC
      ) E
      OUTER APPLY (
        SELECT TOP 1 Rate
        FROM dbo.ExchangeRate E2
        WHERE E.Rate IS NULL
        ORDER BY E2.RateDate
      ) E2;
    

    I don't know which one might perform better, or if either will perform better than other answers on the page. With a proper index on the Date columns, they should zing pretty well--definitely better than any Row_Number() solution.

    0 讨论(0)
提交回复
热议问题