how to get the 3rd report to combine the customer and order data

≡放荡痞女 提交于 2019-12-24 11:21:43

问题


I have a question about retention rate.
I have 2 tables, including the customer data and the order data.

 DISTRIBUTOR as d
+---------+-----------+--------------+--------------------+
|   ID    | SETUP_DT  | REINSTATE_DT | LOCAL_REINSTATE_DT |
+---------+-----------+--------------+--------------------+
| C111111 | 2018/1/1  | Null         | Null               |
| C111112 | 2015/12/9 | 2018/10/25   | 2018/10/25         |
| C111113 | 2018/10/1 | Null         | Null               |
| C111114 | 2018/10/6 | 2018/12/14   | 2018/12/14         |
+---------+-----------+--------------+--------------------+
 ORDER as o, please noted that the data is for reference...
+---------+----------+-----+
|   ID    |  ORD_DT  | OAL |
+---------+----------+-----+
| C111111 | 2018/1/1 | 112 |
| C111111 | 2018/1/1 | 100 |
| C111111 | 2018/1/1 | 472 |
| C111111 | 2018/1/1 | 452 |
| C111111 | 2018/1/1 | 248 |
| C111111 | 2018/1/1 | 996 |
+---------+----------+-----+
 The 3rd Table in my mind to create the retention rate report
+---------+-----------+-----------+---------------+-----------+
|   ID    |  APP_MON  | ORDER_MON | TimeDiff(Mon) | TTL AMT |
+---------+-----------+-----------+---------------+-----------+
| C111111 | 2018/1/1  | 2018/1/1  |             - |  25,443   |
| C111111 | 2018/1/1  | 2018/2/1  |             1 |  7,610    |
| C111111 | 2018/1/1  | 2018/3/1  |             2 |  20,180   |
| C111111 | 2018/1/1  | 2018/4/1  |             3 |  22,265   |
| C111111 | 2018/1/1  | 2018/5/1  |             4 |  34,118   |
| C111111 | 2018/1/1  | 2018/6/1  |             5 |  19,523   |
| C111111 | 2018/1/1  | 2018/7/1  |             6 |  20,220   |
| C111111 | 2018/1/1  | 2018/8/1  |             7 |  2,006    |
| C111111 | 2018/1/1  | 2018/9/1  |             8 |  15,813   |
| C111111 | 2018/1/1  | 2018/10/1 |             9 |  16,733   |
| C111111 | 2018/1/1  | 2018/11/1 |            10 |  20,973   |
| C111112 | 2018/10/1 | 2017/11/1 |             - |  516      |
| C111112 | 2018/10/1 | 2018/10/1 |             - |  1        |
| C111113 | 2018/10/1 | Null      |             - | Null      |
| C111114 | 2018/12/1 | Null      |             - | Null      |
+---------+-----------+-----------+---------------+-----------+

Definition:
- APP_MON: the month that the customer joined, which is the max date from the start date of [d.SETUP_DT], [d.REINSTATE_DT] and [d.LOCAL_REINSTATE_DT]
- ORD_MON: the month that the customer purchased, which is the start date of the order date month
- TimeDiff: The duration by month between APP_MON and ORD_MON, e.g. if A's ODR_MON is 2018/1/1 and A'S APP_MON is 2018/2/1, the duration is 1.
- TTL_AMT: the total order amount that the customer bought in the related order date month

I tried to get the data from 3rd table. But I run the code below and it's very slow... I need a more effective way since I have millions of data... Thanks.


回答1:


I don't think you need to use unpivot. To get the latest date you can just use the greatest() function.

This solution has two subqueries, one to calculate the app_mon for each new customer and the other to calculate the earliest order date for all customers who placed an order in the last two years. This may not be the most performative approach but your first priority should be to get the correct outcome; once you have that you can tune it if necessary:

with cust as 
(
    select d.dist_id as id
          , greatest(d.setup_dt, d.reinstate_dt, d.local_reinstate_dt) as app_mo 
    from mjensen_dev.gc_distributor d
    where d.setup_dt >= date '2017-01-01'
    or d.reinstate_dt >= date '2017-01-01'
    or d.local_reinstate_dt >= date '2017-01-01'
) , ord as 
(
    select o.dist_id as id
          , min(o.ord_dt) as ord_mon 
          , sum(o.oal) as ord_amt
    from gc_orders o
    where o.ord_dt >= date '2017-01-01'
    group by o.dist_id
          , trunc(o.ord_dt,'mm')
)
select cust.dist_id as id
       , cust.app_mon
       , ord.ord_mon
       , floor(months_between(ord.ord_mon, cust.app_mon ) as mon_diff
       , sum(o.oal) as ord_amt
from cust
     inner join gc_orders o on cust.id = o.dist_id
order by 1, 2
/

You may wish to tweak at my calculation of mon_diff. This calculation treats 2018/2/1 - 2018/1/1 as one month difference. Because it seems odd to me that a customer who places an order on the day they joined would have a mon_diff of 1 rather than zero. But if your statement of the business rule is correct you would need to add 1 to the calculation. Likewise I have not included the trunc() in the processing of the dates but you may wish to reinstate it.



来源:https://stackoverflow.com/questions/53911857/how-to-get-the-3rd-report-to-combine-the-customer-and-order-data

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!