Using the base idea from How to "perfectly" override a dict?, I coded a class based on dictionaries that should support assigning dot delimited keys, i.e. Extendeddict('level1.level2', 'value') == {'level1':{'level2':'value'}}
The code is
import collections
import numpy
class Extendeddict(collections.MutableMapping):
"""Dictionary overload class that adds functions to support chained keys, e.g. A.B.C
:rtype : Extendeddict
"""
# noinspection PyMissingConstructor
def __init__(self, *args, **kwargs):
self._store = dict()
self.update(dict(*args, **kwargs))
def __getitem__(self, key):
keys = self._keytransform(key)
print 'Original key: {0}\nTransformed keys: {1}'.format(key, keys)
if len(keys) == 1:
return self._store[key]
else:
key1 = '.'.join(keys[1:])
if keys[0] in self._store:
subdict = Extendeddict(self[keys[0]] or {})
try:
return subdict[key1]
except:
raise KeyError(key)
else:
raise KeyError(key)
def __setitem__(self, key, value):
keys = self._keytransform(key)
if len(keys) == 1:
self._store[key] = value
else:
key1 = '.'.join(keys[1:])
subdict = Extendeddict(self.get(keys[0]) or {})
subdict.update({key1: value})
self._store[keys[0]] = subdict._store
def __delitem__(self, key):
keys = self._keytransform(key)
if len(keys) == 1:
del self._store[key]
else:
key1 = '.'.join(keys[1:])
del self._store[keys[0]][key1]
if not self._store[keys[0]]:
del self._store[keys[0]]
def __iter__(self):
return iter(self._store)
def __len__(self):
return len(self._store)
def __repr__(self):
return self._store.__repr__()
# noinspection PyMethodMayBeStatic
def _keytransform(self, key):
try:
return key.split('.')
except:
return [key]
But with Python 2.7.10 and numpy 1.11.0, running
basic = {'Test.field': 'test'}
print 'Normal dictionary: {0}'.format(basic)
print 'Normal dictionary in a list: {0}'.format([basic])
print 'Normal dictionary in numpy array: {0}'.format(numpy.array([basic], dtype=object))
print 'Normal dictionary in numpy array.tolist(): {0}'.format(numpy.array([basic], dtype=object).tolist())
extended_dict = Extendeddict(basic)
print 'Extended dictionary: {0}'.format(extended_dict)
print 'Extended dictionary in a list: {0}'.format([extended_dict])
print 'Extended dictionary in numpy array: {0}'.format(numpy.array([extended_dict], dtype=object))
print 'Extended dictionary in numpy array.tolist(): {0}'.format(numpy.array([extended_dict], dtype=object).tolist())
I get:
Normal dictionary: {'Test.field': 'test'}
Normal dictionary in a list: [{'Test.field': 'test'}]
Normal dictionary in numpy array: [{'Test.field': 'test'}]
Normal dictionary in numpy array.tolist(): [{'Test.field': 'test'}]
Original key: Test
Transformed keys: ['Test']
Extended dictionary: {'Test': {'field': 'test'}}
Extended dictionary in a list: [{'Test': {'field': 'test'}}]
Original key: 0
Transformed keys: [0]
Traceback (most recent call last):
File "/tmp/scratch_2.py", line 77, in <module>
print 'Extended dictionary in numpy array: {0}'.format(numpy.array([extended_dict], dtype=object))
File "/tmp/scratch_2.py", line 20, in __getitem__
return self._store[key]
KeyError: 0
Whereas I would expect print 'Extended dictionary in numpy array: {0}'.format(numpy.array([extended_dict], dtype=object))
to result in Extended dictionary in numpy array: [{'Test': {'field': 'test'}}]
Any suggestions on what might be wrong for this? Is this even the right way to do it?
The problem is in the np.array
constructor step. It digs into its inputs trying to create a higher dimensional array.
In [99]: basic={'test.field':'test'}
In [100]: eb=Extendeddict(basic)
In [104]: eba=np.array([eb],object)
<keys: 0,[0]>
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-104-5591a58c168a> in <module>()
----> 1 eba=np.array([eb],object)
<ipython-input-88-a7d937b1c8fd> in __getitem__(self, key)
11 keys = self._keytransform(key);print key;print keys
12 if len(keys) == 1:
---> 13 return self._store[key]
14 else:
15 key1 = '.'.join(keys[1:])
KeyError: 0
But if I make an array, and assign the object it works fine
In [105]: eba=np.zeros((1,),object)
In [106]: eba[0]=eb
In [107]: eba
Out[107]: array([{'test': {'field': 'test'}}], dtype=object)
np.array
is a tricky function to use with dtype=object
. Compare np.array([[1,2],[2,3]],dtype=object)
and np.array([[1,2],[2]],dtype=object)
. One is (2,2) the other (2,). It tries to make a 2d array, and resorts to 1d with list elements only if that fails. Something along that line is happening here.
I see 2 solutions - one is this round about way of constructing the array, which I've used in other occasions. The other is to figure out why np.array
doesn't dig into dict
but does with yours. np.array
is compiled, so that may require reading tough GITHUB code.
I tried a solution with f=np.frompyfunc(lambda x:x,1,1)
, but that doesn't work (see my edit history for details). But I found that mixing an Extendeddict
with a dict
does work:
In [139]: np.array([eb,basic])
Out[139]: array([{'test': {'field': 'test'}}, {'test.field': 'test'}], dtype=object)
So does mixing it with something else like None
or an empty list
In [140]: np.array([eb,[]])
Out[140]: array([{'test': {'field': 'test'}}, []], dtype=object)
In [142]: np.array([eb,None])[:-1]
Out[142]: array([{'test': {'field': 'test'}}], dtype=object)
This is another common trick for constructing an object array of lists.
It also works if you give it two or more Extendeddict
with different lengths
np.array([eb, Extendeddict({})])
. In other words if len(...)
differ (just as with mixed lists).
Numpy tries to do what it's supposed to do:
Numpy checks for each element if it's iterable (by using len
and iter
) because what you pass in may be interpreted as a multidimensional array.
There is a catch here: dict
-like classes (meaning isinstance(element, dict) == True
) will not be interpreted as another dimension (that is why passing in [{}]
works). Probably they should check if it's a collections.Mapping
instead of a dict
. Maybe you can file a bug on their issue tracker.
If you change your class definition to:
class Extendeddict(collections.MutableMapping, dict):
...
or change your __len__
-method:
def __len__(self):
raise NotImplementedError
it works. Neither of these might be something that you want to do but numpy just uses duck typing to create the array and without subclassing directly from dict
or by making len
inaccessible numpy sees your class as something that ought to be another dimension. This is rather clever and convenient in case you want to pass in customized sequences (subclasses from collections.Sequence
) but inconvenient for collections.Mapping
or collections.MutableMapping
. I think this a Bug.
来源:https://stackoverflow.com/questions/36663919/override-a-dict-with-numpy-support