Finding continuous ranges in a set of numbers

前端 未结 4 1558
谎友^
谎友^ 2020-12-11 04:44

I have a reasonably large set of phone numbers (approximately 2 million) in a database table. These numbers have been inserted in blocks, so there are many continuous ranges

相关标签:
4条回答
  • 2020-12-11 05:00

    Use an auxiliary table of all possible sequential values or materialize one in a CTE e.g.

    WITH
    -- materialize a table of sequential integers
    l0 AS (SELECT 0 AS c UNION ALL SELECT 0),
    l1 AS (SELECT 0 AS c FROM l0 AS a, l0 AS b),
    l2 AS (SELECT 0 AS c FROM l1 AS a, l1 AS b),
    l3 AS (SELECT 0 AS c FROM l2 AS a, l2 AS b),
    l4 AS (SELECT 0 AS c FROM l2 AS a, l3 AS b),
    l5 AS (SELECT 0 AS c FROM l2 AS a, l4 AS b),
    nums AS (SELECT row_number() OVER(ORDER BY c) AS n FROM l5), 
    -- materialize sample table
    MyTable (ID) AS 
    (
     SELECT 1000
     UNION ALL 
     SELECT 1001
     UNION ALL 
     SELECT 1002
     UNION ALL 
     SELECT 1010
     UNION ALL 
     SELECT 1011
     UNION ALL 
     SELECT 1012
     UNION ALL 
     SELECT 1013
     UNION ALL 
     SELECT 1020
     UNION ALL 
     SELECT 1021
     UNION ALL 
     SELECT 1022
    ), 
    -- materialize parameter table
    params (param) AS (SELECT 1012)
    SELECT MIN(N1.n) - 1 AS last_in_sequence
      FROM nums AS N1 
           CROSS JOIN params AS P1
     WHERE N1.n > P1.param
           AND NOT EXISTS 
           (
            SELECT * 
              FROM MyTable AS T1
             WHERE N1.n = T1.ID
           );
    
    0 讨论(0)
  • 2020-12-11 05:11

    Theoretically the items in a set have no particular value, so I'm assuming you also have some continuous ID column that defines the order of the numbers. Something like this:

    ID  Number
    1   1000
    2   1001
    3   1002
    4   1010
    5   1011
    6   1012
    7   1013
    8   1020
    9   1021
    10  1022
    

    You could create an extra column that contains the result of Number - ID:

    ID  Number  Diff
    1   1000    999
    2   1001    999
    3   1002    999
    4   1010    1006
    5   1011    1006
    6   1012    1006
    7   1013    1006
    8   1020    1012
    9   1021    1012
    10  1022    1012
    

    Numbers in the same range will have the same result in the Diff column.

    0 讨论(0)
  • 2020-12-11 05:12

    If you use SQL server you should be able to make a recursive query that will join on root.number = leaf.number + 1

    If you select the number from the root and from the last recursion, and the level of the recursion you should have a working query.

    I would first test performance of that, and then if not satisfactory turn to cursor/row based approach (which in this case would do a job with a single full scan, where recursion can fail by reaching max recursion depth).

    Otherwise your options is to store data differently and maintain a list of min, max numbers associated with a table.

    This could actually be implemented in triggers with not such a high penalty on single row updates (updates on the single row of the base table would either update, delete or split a row in the min-max table; this can be determined by querying the 'previous' and 'next' row only).

    0 讨论(0)
  • 2020-12-11 05:13

    SQL can't really do this in a single query (except there are native SQL enhancements I don't know about), because SQL can't access the row 'before' or 'after'.

    You need to go through the sequence in a loop.

    You may try NHibernates Enumerable, which doesn't load the entities into memory, but only creates proxies of them. Actually I don't think that it is a good idea, because it will create proxies for the whole 2 million numbers.

    Plan B, use paging. Roughly, it looks like this:

    List<PhoneNumber> result = new List<PhoneNumber>();
    
    int input = 1012;
    int pageSize = 100;
    int currentPage = 0;
    int expectedNumber = input;
    
    bool carryOn = true;
    
    while(carryOn)
    {
      var numbers = session
        .CreateQuery("from PhoneNumber pn where pn.Number > :input")
        .SetInt("input", input)
        .SetFirstResult(currentPage * pageSize)
        .SetMaxResult(pageSize)
        .List<PhoneNumbers>();
    
      foreach(var number in numbers)
      {
        expectNumber++;
        if (number.Number != expectedNumber) 
        {
          carryOn = false;
          break;
        }
        result.Add(number);
      }
    
      currentPage++;
    }
    

    And the same for the range before in the other direction.

    0 讨论(0)
提交回复
热议问题