Best way to sort 1M records in Python

前端未结

关注

 11  1608

I have a service that runs that takes a list of about 1,000,000 dictionaries and does the following

myHashTable = {}
myLists = { \'hits\':{}, \'misses\':{},


                      
              相关标签:


      
      
        
          11条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  别那么骄傲        
                
              
                            
                2020-12-05 09:33
              
            
            
                                                                       
Honestly, the best way is to not use Python.  If performance is a major concern for this, use a faster language.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  失恋的感觉        
                
              
                            
                2020-12-05 09:38
              
            
            
                                                                       
You may find this related answer from Guido:  Sorting a million 32-bit integers in 2MB of RAM using Python
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  醉话见心        
                
              
                            
                2020-12-05 09:39
              
            
            
                                                                       
Others have provided some excellent advices, try them out. 

As a general advice, in situations like that you need to profile your code. Know exactly where most of the time is spent. Bottlenecks hide well, in places you least expect them to be.

If there is a lot of number crunching involved then a JIT compiler like the (now-dead) psyco might also help. When processing takes minutes or hours 2x speed-up really counts.


http://docs.python.org/library/profile.html
http://www.vrplumber.com/programming/runsnakerun/ 
http://psyco.sourceforge.net/

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  心在旅途        
                
              
                            
                2020-12-05 09:40
              
            
            
                                                                       
What you really want is an ordered container, instead of an unordered one.  That would implicitly sort the results as they're inserted.  The standard data structure for this is a tree.

However, there doesn't seem to be one of these in Python.  I can't explain that; this is a core, fundamental data type in any language.  Python's dict and set are both unordered containers, which map to the basic data structure of a hash table.  It should definitely have an optimized tree data structure; there are many things you can do with them that are impossible with a hash table, and they're quite tricky to implement well, so people generally don't want to be doing it themselves.

(There's also nothing mapping to a linked list, which also should be a core data type.  No, a deque is not equivalent.)

I don't have an existing ordered container implementation to point you to (and it should probably be implemented natively, not in Python), but hopefully this will point you in the right direction.

A good tree implementation should support iterating across a range by value ("iterate all values from [2,100] in order"), find next/prev value from any other node in O(1), efficient range extraction ("delete all values in [2,100] and return them in a new tree"), etc.  If anyone has a well-optimized data structure like this for Python, I'd love to know about it.  (Not all operations fit nicely in Python's data model; for example, to get next/prev value from another value, you need a reference to a node, not the value itself.)
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  北海茫月        
                
              
                            
                2020-12-05 09:42
              
            
            
                                                                       
sorted(myLists[key], key=mylists[key].get, reverse=True)


should save you some time, though not a lot.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
   
          
     上一页
1
2
           
           
        
                                  
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复