Proper way to access latest row for each individual identifier?

后端未结

关注

 5  572

轻奢々 2021-01-03 11:14

I have a table core_message in Postgres, with millions of rows that looks like this (simplified):

┌────────────────┬──


      
      
        
          5条回答        

        
                    
            
            
                         
                
              
              
                
                   轮回少年
                                             
                
                
                (楼主)
            
              
              
                2021-01-03 11:54
              

            
            
                        
You have put existing answers to good use and came up with great solutions in your own answer. Some missing pieces:


  I'm still trying to understand how to properly use his first RECURSIVE solution ...


You used this query to create the test_boats table with unique mmsi:

select distinct on (mmsi) mmsi from core_message


For many rows per boat (mmsi), use this faster RECURSIVE solution instead:

WITH RECURSIVE cte AS (
   (
   SELECT mmsi
   FROM   core_message
   ORDER  BY mmsi
   LIMIT  1
   )
   UNION ALL
   SELECT m.*
   FROM   cte c
   CROSS  JOIN LATERAL (
      SELECT mmsi
      FROM   core_message
      WHERE  mmsi > c.mmsi
      ORDER  BY mmsi
      LIMIT  1
      ) m
   )
TABLE cte;


This hardly gets any slower with more rows per boat, as opposed to DISTINCT ON which is typically faster with only few rows per boat. Each only needs an index with mmsi as leading column to be fast.

If possible, create that boats table and add a FK constraint to it. (Means you have to maintain it.) Then you can go on using the optimal LATERAL query you have in your answer and never miss any boats. (Orphaned boats may be worth tracking / removing in the long run.)

Else, another iteration of that RECURSIVE query is the next best thing to get whole rows for the latest position of each boat quickly:

WITH RECURSIVE cte AS (
   (
   SELECT *
   FROM   core_message
   ORDER  BY mmsi DESC, time DESC  -- see below
   LIMIT  1
   )
   UNION ALL
   SELECT m.*
   FROM   cte c
   CROSS  JOIN LATERAL (
      SELECT *
      FROM   core_message
      WHERE  mmsi < c.mmsi
      ORDER  BY mmsi DESC, time DESC
      LIMIT  1
      ) m
   )
TABLE cte;


You have both of these indexes:

"core_message_uniq_mmsi_time" UNIQUE CONSTRAINT, btree (mmsi, "time")
"core_messag_mmsi_b36d69_idx" btree (mmsi, "time" DESC)


A UNIQUE constraint is implemented with all columns in default ASC sort order. That cannot be changed. If you don't actually need the constraint, you might replace it with a UNIQUE index, mostly achieving the same. But there you can add any sort order you like. Related:


How does PostgreSQL enforce the UNIQUE constraint / what type of index does it use?


But there is no need for the use case at hand. Postgres can scan a b-tree index backwards at practically the same speed. And I see nothing here that would require inverted sort order for the two columns. The additional index core_messag_mmsi_b36d69_idx is expensive dead freight - unless you have other use cases that actually need it. See:


Optimizing queries on a range of timestamps (two columns)


To best use the index core_message_uniq_mmsi_time from the UNIQUE constraint I step through both columns in descending order. That matters.
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它5个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复