merge DATE-rows if episodes are in direct succession or overlapping

后端未结

关注

 4  889

我寻月下人不归 2021-01-14 07:47

I have a table like this:

ID    BEGIN    END

If there are overlapping episodes for the same ID (like 2000-01-01 - 2001-1


      
      
        
          4条回答        

        
                    
            
            
                         
                
              
              
                
                   無奈伤痛
                                             
                
                
                (楼主)
            
              
              
                2021-01-14 08:26
              

            
            
                        
Edit:  That is great news that your DBA agreed to upgrade to a newer version of PostgreSQL. The windowing functions alone make the upgrade a worthwhile investment.  

My original answer—as you note—has a major flaw: a limitation of one row per id.

Below is a better solution without such a limitation.

I have tested it using test tables on my system (8.4).  

If / when you get a moment I would like to know how it performs on your data.

I also wrote up an explanation here: https://www.mechanical-meat.com/1/detail

WITH RECURSIVE t1_rec ( id, "begin", "end", n ) AS (
    SELECT id, "begin", "end", n
      FROM (
        SELECT
            id, "begin", "end",
            CASE 
                WHEN LEAD("begin") OVER (
                PARTITION BY    id
                ORDER BY        "begin") <= ("end" + interval '2' day) 
                THEN 1 ELSE 0 END AS cl,
            ROW_NUMBER() OVER (
                PARTITION BY    id
                ORDER BY        "begin") AS n
        FROM mytable 
    ) s
    WHERE s.cl = 1
  UNION ALL
    SELECT p1.id, p1."begin", p1."end", a.n
      FROM t1_rec a 
           JOIN mytable p1 ON p1.id = a.id
       AND p1."begin" > a."begin"
       AND (a."begin",  a."end" + interval '2' day) OVERLAPS 
           (p1."begin", p1."end")
)
SELECT t1.id, min(t1."begin"), max(t1."end")
  FROM t1_rec t1
       LEFT JOIN t1_rec t2 ON t1.id = t2.id 
       AND t2."end" = t1."end"
       AND t2.n < t1.n
 WHERE t2.n IS NULL
 GROUP BY t1.id, t1.n
 ORDER BY t1.id, t1.n;


Original (deprecated) answer follows;

note: limitation of one row per id.



Denis is probably right about using lead() and lag(), but there is yet another way!

You can also solve this problem using so-called recursive SQL.

The overlaps function also comes in handy.  

I have fully tested this solution on my system (8.4).

It works well.  

WITH RECURSIVE rec_stmt ( id, begin, end ) AS (
    /* seed statement: 
           start with only first start and end dates for each id 
    */
      SELECT id, MIN(begin), MIN(end)
        FROM mytable seed_stmt
    GROUP BY id

    UNION ALL

    /* iterative (not really recursive) statement: 
           append qualifying rows to resultset 
    */
      SELECT t1.id, t1.begin, t1.end
        FROM rec_stmt r
             JOIN mytable t1 ON t1.id = r.id
         AND t1.begin > r.end
         AND (r.begin, r.end + INTERVAL '1' DAY) OVERLAPS 
             (t1.begin - INTERVAL '1' DAY, t1.end)
)
  SELECT MIN(begin), MAX(end) 
    FROM rec_stmt
GROUP BY id;

    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它4个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复