What is the best data structure for storing a set of four (or more) values?

后端未结

关注

 4  1783

难免孤独 2020-12-03 06:33

Say I have the following variables and its corresponding values which represents a record.

name = \'abc\'
age = 23
we


      
      
        
          4条回答        

        
                    
            
            
                         
                
              
              
                
                   南方客
                                             
                
                
                (楼主)
            
              
              
                2020-12-03 07:07
              

            
            
                        
Given http://wiki.python.org/moin/TimeComplexity how about this:


Have a dictionary for every column you're interested in - AGE, NAME, etc. 
Have the keys of that dictionaries (AGE, NAME) be possible values for given column (35 or "m"). 
Have a list of lists representing values for one "collection", e.g. VALUES = [ [35, "m"], ...]
Have the value of column dictionaries (AGE, NAME) be lists of indices from the VALUES list.
Have a dictionary which maps column name to index within lists in VALUES so that you know that first column is age and second is sex (you could avoid that and use dictionaries, but they introduce large memory footrpint and with over 100K objects this may or not be a problem).


Then the retrieve function could look like this:

def retrieve(column_name, column_value):
    if column_name == "age":
        return [VALUES[index] for index in AGE[column_value]]      
    elif ...: # repeat for other "columns"


Then, this is what you get

VALUES = [[35, "m"], [20, "f"]]
AGE = {35:[0], 20:[1]}
SEX = {"m":[0], "f":[1]}
KEYS = ["age", "sex"]

retrieve("age", 35)
# [[35, 'm']]


If you want a dictionary, you can do the following:

[dict(zip(KEYS, values)) for values in retrieve("age", 35)]
# [{'age': 35, 'sex': 'm'}]


but again, dictionaries are a little heavy on the memory side, so if you can go with lists of values it might be better.

Both dictionary and list retrieval are O(1) on average - worst case for dictionary is O(n) - so this should be pretty fast. Maintaining that will be a little bit of pain, but not so much. To "write", you'd just have to append to the VALUES list and then append the index in VALUES to each of the dictionaries.

Of course, then best would be to benchmark your actual implementation and look for potential improvements, but hopefully this make sense and will get you going :)

EDIT:

Please note that as @moooeeeep said, this will only work if your values are hashable and therefore can be used as dictionary keys.
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它4个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复