As I did a bit test, a python dict of int=>int (different value) of 30 million items can easily eats >2G memory on my mac. Since I work with only int to int dict, is there a
There are at least two possibilities:
arrays
You could try using two arrays. One for the keys, and one for the values so that index(key) == index(value)
Updated 2017-01-05: use 4-byte integers in array.
An array would use less memory. On a 64-bit FreeBSD machine with python compiled with clang, an array of 30 million integers uses around 117 MiB.
These are the python commands I used:
Python 2.7.13 (default, Dec 28 2016, 20:51:25)
[GCC 4.2.1 Compatible FreeBSD Clang 3.8.0 (tags/RELEASE_380/final 262564)] on freebsd11
Type "help", "copyright", "credits" or "license" for more information.
>>> from array import array
>>> a = array('i', xrange(30000000))
>>> a.itemsize
4
After importing array, ps
reports:
USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND
rsmith 81023 0.0 0.2 35480 8100 0 I+ 20:35 0:00.03 python (python2.7)
After making the array:
USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND
rsmith 81023 29.0 3.1 168600 128776 0 S+ 20:35 0:04.52 python (python2.7)
The Resident Set Size is reported in 1 KiB units, so (128776 - 8100)/1024 = 117 MiB
With list comprehensions you could easily get a list of indices where the key meets a certain condition. You can then use the indices in that list to access the corresponding values...
numpy
If you have numpy available, using that is faster, has lots more features and and uses slightly less RAM:
Python 2.7.5 (default, Jun 10 2013, 19:54:11)
[GCC 4.2.1 Compatible FreeBSD Clang 3.1 ((branches/release_31 156863))] on freebsd9
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> a = np.arange(0, 30000000, dtype=np.int32)
From ps
: 6700 KiB after starting Python, 17400 KiB after import numpy and 134824 KiB after creating the array. That's around 114 MiB.
Furthermore, numpy supports record arrays;
Python 2.7.5 (default, Jun 10 2013, 19:54:11)
[GCC 4.2.1 Compatible FreeBSD Clang 3.1 ((branches/release_31 156863))] on freebsd9
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> a = np.zeros((10,), dtype=('i4,i4'))
>>> a
array([(0, 0), (0, 0), (0, 0), (0, 0), (0, 0), (0, 0), (0, 0), (0, 0),
(0, 0), (0, 0)],
dtype=[('f0', '>> a.dtype.names
('f0', 'f1')
>>> a.dtype.names = ('key', 'value')
>>> a
array([(0, 0), (0, 0), (0, 0), (0, 0), (0, 0), (0, 0), (0, 0), (0, 0),
(0, 0), (0, 0)],
dtype=[('key', '>> a[3] = (12, 5429)
>>> a
array([(0, 0), (0, 0), (0, 0), (12, 5429), (0, 0), (0, 0), (0, 0), (0, 0),
(0, 0), (0, 0)],
dtype=[('key', '>> a[3]['key']
12
Here you can access the keys and values separately;
>>> a['key']
array([ 0, 0, 0, 12, 0, 0, 0, 0, 0, 0], dtype=int32)