What is the best data structure for storing a set of four (or more) values?

后端 未结 4 1783
难免孤独
难免孤独 2020-12-03 06:33

Say I have the following variables and its corresponding values which represents a record.

name = \'abc\'
age = 23
we         


        
4条回答
  •  南方客
    南方客 (楼主)
    2020-12-03 07:07

    Given http://wiki.python.org/moin/TimeComplexity how about this:

    • Have a dictionary for every column you're interested in - AGE, NAME, etc.
    • Have the keys of that dictionaries (AGE, NAME) be possible values for given column (35 or "m").
    • Have a list of lists representing values for one "collection", e.g. VALUES = [ [35, "m"], ...]
    • Have the value of column dictionaries (AGE, NAME) be lists of indices from the VALUES list.
    • Have a dictionary which maps column name to index within lists in VALUES so that you know that first column is age and second is sex (you could avoid that and use dictionaries, but they introduce large memory footrpint and with over 100K objects this may or not be a problem).

    Then the retrieve function could look like this:

    def retrieve(column_name, column_value):
        if column_name == "age":
            return [VALUES[index] for index in AGE[column_value]]      
        elif ...: # repeat for other "columns"
    

    Then, this is what you get

    VALUES = [[35, "m"], [20, "f"]]
    AGE = {35:[0], 20:[1]}
    SEX = {"m":[0], "f":[1]}
    KEYS = ["age", "sex"]
    
    retrieve("age", 35)
    # [[35, 'm']]
    

    If you want a dictionary, you can do the following:

    [dict(zip(KEYS, values)) for values in retrieve("age", 35)]
    # [{'age': 35, 'sex': 'm'}]
    

    but again, dictionaries are a little heavy on the memory side, so if you can go with lists of values it might be better.

    Both dictionary and list retrieval are O(1) on average - worst case for dictionary is O(n) - so this should be pretty fast. Maintaining that will be a little bit of pain, but not so much. To "write", you'd just have to append to the VALUES list and then append the index in VALUES to each of the dictionaries.

    Of course, then best would be to benchmark your actual implementation and look for potential improvements, but hopefully this make sense and will get you going :)

    EDIT:

    Please note that as @moooeeeep said, this will only work if your values are hashable and therefore can be used as dictionary keys.

提交回复
热议问题