Say I have the following variables
and its corresponding values
which represents a record
.
name = \'abc\'
age = 23
we
Given http://wiki.python.org/moin/TimeComplexity how about this:
AGE
, NAME
, etc. AGE
, NAME
) be possible values for given column (35 or "m"). VALUES = [ [35, "m"], ...]
AGE
, NAME
) be lists of indices from the VALUES
list.VALUES
so that you know that first column is age and second is sex (you could avoid that and use dictionaries, but they introduce large memory footrpint and with over 100K objects this may or not be a problem).Then the retrieve
function could look like this:
def retrieve(column_name, column_value):
if column_name == "age":
return [VALUES[index] for index in AGE[column_value]]
elif ...: # repeat for other "columns"
Then, this is what you get
VALUES = [[35, "m"], [20, "f"]]
AGE = {35:[0], 20:[1]}
SEX = {"m":[0], "f":[1]}
KEYS = ["age", "sex"]
retrieve("age", 35)
# [[35, 'm']]
If you want a dictionary, you can do the following:
[dict(zip(KEYS, values)) for values in retrieve("age", 35)]
# [{'age': 35, 'sex': 'm'}]
but again, dictionaries are a little heavy on the memory side, so if you can go with lists of values it might be better.
Both dictionary and list retrieval are O(1) on average - worst case for dictionary is O(n) - so this should be pretty fast. Maintaining that will be a little bit of pain, but not so much. To "write", you'd just have to append to the VALUES
list and then append the index in VALUES
to each of the dictionaries.
Of course, then best would be to benchmark your actual implementation and look for potential improvements, but hopefully this make sense and will get you going :)
EDIT:
Please note that as @moooeeeep said, this will only work if your values are hashable and therefore can be used as dictionary keys.