Databricks : Equivalent code for SQL query

China☆狼群 提交于 2021-01-28 18:22:16

问题


I'm looking for the equivalent databricks code for the query. I added some sample code and the expected as well, but in particular I'm looking for the equivalent code in Databricks for the query. For the moment I'm stuck on the CROSS APPLY STRING SPLIT part.

Sample SQL data:

    CREATE TABLE FactTurnover
    (
    ID INT,
    SalesPriceExcl NUMERIC (9,4),
  Discount VARCHAR(100)
  )
 INSERT INTO FactTurnover
 VALUES 
   (1, 100, '10'),
   (2, 39.5877, '58, 12'),
   (3, 100, '50, 10, 15'),
   (4, 100, 'B')

Query:

    ;WITH CTE AS
    (
     SELECT Id, SalesPriceExcl, 
         CASE WHEN value = 'B' THEN 0
         ELSE CAST(value as int) END AS Discount
     From FactTurnover
     CROSS APPLY STRING_SPLIT(Discount, ',')
     )
     SELECT Id,  
       Min(SalesPriceExcl) AS SalesPriceExcludingDiscount,
       EXP(SUM(LOG((100 - Discount) / 100.0))) As TotalDiscount,
       Cast(EXP(SUM(LOG((100 - Discount) / 100.0))) * 
            MIN(SalesPriceExcl) As Numeric(9,2))
        PriceAfterDiscount
     FROM CTE
     GROUP BY ID

Expected Results:

| Id | SalesPriceExcludingDiscount |       TotalDiscount | PriceAfterDiscount |

|----|-----------------------------|---------------------|--------------------|

|  1 |                         100 |                 0.9 |                 90 |

|  2 |                     39.5877 | 0.36960000000000004 |              14.63 |

|  3 |                         100 | 0.38250000000000006 |              38.25 |

|  4 |                         100 |                   1 |                100 |

回答1:


Use SPLIT to convert the comma-separated string to an array then use LATERAL VIEW and EXPLODE to do operations on the elements of that array. The roughly equivalent syntax (including CTEs) is:

%sql
--SELECT * FROM FactTurnover;

WITH cte AS
(
SELECT *
FROM
  (
  SELECT Id, SalesPriceExcl, SPLIT ( Discount, ',' ) AS discountArray
  FROM FactTurnover
  ) x
  LATERAL VIEW EXPLODE ( discountArray ) x AS xdiscount
)
SELECT 
  Id,
  MIN(SalesPriceExcl) AS SalesPriceExcludingDiscount,
  EXP ( SUM( LOG( ( 100 - xdiscount ) / 100.00 ) ) ) AS TotalDiscount
FROM cte
GROUP BY Id
ORDER BY Id

If you are feeling brave, you could also do this using higher order functions. I've included two examples below. I would say these are harder to debug and you should probably try them performance-wise, it depends what you're comfortable with:

%sql
-- Convert Discount text column to array with SPLIT function and filter out value 'B' from the array
;WITH filterB AS (
SELECT *, FILTER ( SPLIT ( Discount, ',' ), x -> x != 'B' ) discountArray
FROM FactTurnover
), cte1 AS (
-- Do initial calcs on array
SELECT 
  Id,
  TRANSFORM ( discountArray, discountArray -> LOG( ( 100 - discountArray ) / 100.00 ) ) discountArray2
FROM filterB
)
SELECT
  Id,
  EXP( AGGREGATE ( discountArray2, CAST( 0 AS DOUBLE ), ( x, y ) -> x + y ) ) AS x
FROM cte1;

-- all in one example
SELECT 
    Id,
    EXP( AGGREGATE( TRANSFORM( FILTER ( SPLIT ( Discount, ',' ), x -> x != 'B' ), y -> LOG( ( 100 - y ) / 100.00 ) ), CAST( 0 AS DOUBLE ), ( z, a ) -> z + a ) )
    AS final
FROM FactTurnover
ORDER BY Id


来源:https://stackoverflow.com/questions/57994948/databricks-equivalent-code-for-sql-query

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!