sum highest consecutive occurrence

问题

I have a table with three columns (lending_id int, installment_n serial int, status text) and I wonder how to retrieve the biggest gap of WAITING_PAYMENT (status) for each lending_id.

For the following example:

lending_id | installment_n | status
71737   1    PAID
71737   2    PAID
71737   3    PAID
71737   4    PAID
71737   5    PAID
71737   6    WAITING_PAYMENT
71737   7    WAITING_PAYMENT
71737   8    WAITING_PAYMENT
71737   9    WAITING_PAYMENT
71737   10   WAITING_PAYMENT
71737   11   WAITING_PAYMENT
71737   12   WAITING_PAYMENT
71737   13   WAITING_PAYMENT
71737   14   WAITING_PAYMENT
71737   15   WAITING_PAYMENT
71737   16   WAITING_PAYMENT
71737   17   WAITING_PAYMENT
71737   18   WAITING_PAYMENT
71737   19   WAITING_PAYMENT
71737   20   WAITING_PAYMENT
71737   21   WAITING_PAYMENT
354226  1    PAID
354226  2    PAID
354226  3    WAITING_PAYMENT
354226  4    WAITING_PAYMENT
354226  5    WAITING_PAYMENT
354226  6    WAITING_PAYMENT
354226  7    PAID
354226  8    WAITING_PAYMENT
354226  9    WAITING_PAYMENT
354226  10   WAITING_PAYMENT
354226  11   WAITING_PAYMENT
354226  12   WAITING_PAYMENT
354226  13   WAITING_PAYMENT
354226  14   WAITING_PAYMENT
354226  15   WAITING_PAYMENT

I wonder how to retrieve:

lending_id | count
71737      | 16
354226     | 8

Since for 71737 it would consider from installment 6 to 21 (16) and for 354226 the gap between 8 and 15 (8).

回答1:

This is an approach based on mimicking row_number() that will work on MySQL versions not supporting window functions (window functions are planned for inclusion with MySQL v8.x).

The result of this approach will reveal more facts about the longest sequence than just the count alone. See results below for details of this.

SQL Fiddle

MySQL 5.6 Schema Setup:

CREATE TABLE Table1
    (`lending_id` int, `installment_n` int, `status` varchar(15))
;

INSERT INTO Table1
    (`lending_id`, `installment_n`, `status`)
VALUES
    (71737, 1, 'PAID'),
    (71737, 2, 'PAID'),
    (71737, 3, 'PAID'),
    (71737, 4, 'PAID'),
    (71737, 5, 'PAID'),
    (71737, 6, 'WAITING_PAYMENT'),
    (71737, 7, 'WAITING_PAYMENT'),
    (71737, 8, 'WAITING_PAYMENT'),
    (71737, 9, 'WAITING_PAYMENT'),
    (71737, 10, 'WAITING_PAYMENT'),
    (71737, 11, 'WAITING_PAYMENT'),
    (71737, 12, 'WAITING_PAYMENT'),
    (71737, 13, 'WAITING_PAYMENT'),
    (71737, 14, 'WAITING_PAYMENT'),
    (71737, 15, 'WAITING_PAYMENT'),
    (71737, 16, 'WAITING_PAYMENT'),
    (71737, 17, 'WAITING_PAYMENT'),
    (71737, 18, 'WAITING_PAYMENT'),
    (71737, 19, 'WAITING_PAYMENT'),
    (71737, 20, 'WAITING_PAYMENT'),
    (71737, 21, 'WAITING_PAYMENT'),
    (354226, 1, 'PAID'),
    (354226, 2, 'PAID'),
    (354226, 3, 'WAITING_PAYMENT'),
    (354226, 4, 'WAITING_PAYMENT'),
    (354226, 5, 'WAITING_PAYMENT'),
    (354226, 6, 'WAITING_PAYMENT'),
    (354226, 7, 'PAID'),
    (354226, 8, 'WAITING_PAYMENT'),
    (354226, 9, 'WAITING_PAYMENT'),
    (354226, 10, 'WAITING_PAYMENT'),
    (354226, 11, 'WAITING_PAYMENT'),
    (354226, 12, 'WAITING_PAYMENT'),
    (354226, 13, 'WAITING_PAYMENT'),
    (354226, 14, 'WAITING_PAYMENT'),
    (354226, 15, 'WAITING_PAYMENT')
;

Query 1:

select lending_id, status, start_at_inst, end_at_inst, inst_count
from (
      select IF(@prev_value=lending_id, @rn:=@rn+1 , @rn:=1) AS rn
            , lending_id, status, start_at_inst, end_at_inst, inst_count
            , @prev_value := lending_id z
      from (
           select lending_id
                   , status
                   , grpby
                   , min(installment_n) start_at_inst
                   , max(installment_n) end_at_inst
                   , (max(installment_n) + 1) - min(installment_n) inst_count
            from (
                 select
                        IF(@prev_value=concat_ws(',',lending_id,status), @rn:=@rn+1 , @rn:=1) AS rn
                      , t.*
                      , installment_n - @rn grpby
                      , @prev_value := concat_ws(',',lending_id,status) z
                 from Table1 t
                 cross join (
                     select @rn := 0 , @prev_value := ''
                     ) vars
                 order by lending_id, status,installment_n ASC
                 ) d1
            group by lending_id, status, grpby
          ) d2
      cross join (
          select @rn := 0 , @prev_value := ''
          ) vars
      order by lending_id, inst_count DESC
     ) d3
where rn = 1

Results:

| lending_id |          status | start_at_inst | end_at_inst | inst_count |
|------------|-----------------|---------------|-------------|------------|
|     354226 | WAITING_PAYMENT |             8 |          15 |          8 |
|      71737 | WAITING_PAYMENT |             6 |          21 |         16 |

Whilst you can't use row_number() until V8.x of MySQL is in production release; But for users of db's already supporting it, and for MySQL users when it is available, here is the same approach using row_number() which I would xpect to be more efficient than the @variable approach.

select
       lending_id, status, start_at_inst, end_at_inst, inst_count
from (
select 
       lending_id
       , status
       , grpby
       , min(installment_n) start_at_inst
       , max(installment_n) end_at_inst
       , (max(installment_n) + 1) - min(installment_n) inst_count
       , row_number() over(partition by lending_id order by (max(installment_n) + 1) - min(installment_n) DESC) rn
from (
     select
            t.*
          , installment_n - row_number() over(partition by lending_id, status order by installment_n) grpby
     from Table1 t
     ) d1
group by
       lending_id, status, grpby
    ) d2
where rn = 1
;

Result:

 lending_id | status          | start_at_inst | end_at_inst | inst_count
 ---------: | :-------------- | ------------: | ----------: | ---------:
      71737 | WAITING_PAYMENT |             6 |          21 |         16
     354226 | WAITING_PAYMENT |             8 |          15 |          8

dbfiddle (mariadb_10.2) here

回答2:

The below SQL should do the trick and an easy to read and understand fashion:

select t1.lending_id, max(t1.installment_n) - min(t1.installment_n) as count
from table t1
where t1.status = 'WAITING_PAYMENT'
and t1.installment_n > 
  (SELECT max(t2.installment_n) FROM table t2 where t2.lending_id = t1.lending_id and t2.status = 'PAID')
group by lending_id;

For any further clarifications please don't hesitate to ask me.

Ted.

回答3:

You can use a correlated subquery and some additional logic:

select lending_id, max(cnt)
from (select lending_id, t.next_in, count(*) as cnt
      from (select t.*,
                   (select min(t2.installment_n)
                    from t t2
                    where t2.lending_id = t.lending_id and t2.installment_n > t.installment_n and
                          t2.status <> 'WAITING_PAYMENT'
                   ) as next_in
            from t 
            where t.status = 'WAITING_PAYMENT'
           ) t
      group by lending_id, t.next_in
     ) lt
group by lending_id;

How does this work? The innermost subquery gets the next installment number that is not WAITING_PAYMENT -- or NULL if there is none. This identifies all groups of sequential WAITING_PAYMENT records.

The middle subquery calculates the number in each group. The outer query takes the maximum.

来源：https://stackoverflow.com/questions/46124330/sum-highest-consecutive-occurrence

标签

mysql

sql

analytics