python: class vs tuple huge memory overhead (?)

问题

I'm storing a lot of complex data in tuples/lists, but would prefer to use small wrapper classes to make the data structures easier to understand, e.g.

class Person:
    def __init__(self, first, last):
        self.first = first
        self.last = last

p = Person('foo', 'bar')
print(p.last)
...

would be preferable over

p = ['foo', 'bar']
print(p[1])
...

however there seems to be a horrible memory overhead:

l = [Person('foo', 'bar') for i in range(10000000)]
# ipython now taks 1.7 GB RAM

and

del l
l = [('foo', 'bar') for i in range(10000000)]
# now just 118 MB RAM

Why? is there any obvious alternative solution that I didn't think of?

Thanks!

(I know, in this example the 'wrapper' class looks silly. But when the data becomes more complex and nested, it is more useful)

回答1:

As others have said in their answers, you'll have to generate different objects for the comparison to make sense.

So, let's compare some approaches.

`tuple`

l = [(i, i) for i in range(10000000)]
# memory taken by Python3: 1.0 GB

`class Person`

class Person:
    def __init__(self, first, last):
        self.first = first
        self.last = last

l = [Person(i, i) for i in range(10000000)]
# memory: 2.0 GB

`namedtuple` (`tuple` + `slots`)

from collections import namedtuple
Person = namedtuple('Person', 'first last')

l = [Person(i, i) for i in range(10000000)]
# memory: 1.1 GB

namedtuple is basically a class that extends tuple and uses __slots__ for all named fields, but it adds fields getters and some other helper methods (you can see the exact code generated if called with verbose=True).

`class Person` + `slots`

class Person:
    __slots__ = ['first', 'last']
    def __init__(self, first, last):
        self.first = first
        self.last = last

l = [Person(i, i) for i in range(10000000)]
# memory: 0.9 GB

This is a trimmed-down version of namedtuple above. A clear winner, even better than pure tuples.

回答2:

Using __slots__ decreases the memory footprint quite a bit (from 1.7 GB to 625 MB in my test), since each instance no longer needs to hold a dict to store the attributes.

class Person:
    __slots__ = ['first', 'last']
    def __init__(self, first, last):
        self.first = first
        self.last = last

The drawback is that you can no longer add attributes to an instance after it is created; the class only provides memory for the attributes listed in the __slots__ attribute.

回答3:

There is yet another way to reduce the amount of memory occupied by objects by turning off support for cyclic garbage collection in addition to turning off __dict__ and __weakref__. It is implemented in the library recordclass:

$ pip install recordclass

>>> import sys
>>> from recordclass import dataobject, make_dataclass

Create the class:

class Person(dataobject):
   first:str
   last:str

>>> Person = make_dataclass('Person', 'first last')

As result:

>>> print(sys.getsizeof(Person(100,100)))
32

For __slot__ based class we have:

class Person:
    __slots__ = ['first', 'last']
    def __init__(self, first, last):
        self.first = first
        self.last = last

>>> print(sys.getsizeof(Person(100,100)))
64

As a result more saving of memory is possible.

For dataobject-based:

l = [Person(i, i) for i in range(10000000)]
memory size: 681 Mb

For __slots__-based:

  l = [Person(i, i) for i in range(10000000)]
  memory size: 921 Mb

回答4:

In your second example, you only create one object, because tuples are constants.

>>> l = [('foo', 'bar') for i in range(10000000)]
>>> id(l[0])
4330463176
>>> id(l[1])
4330463176

Classes have the overhead, that the attributes are saved in a dictionary. Therefore namedtuples needs only half the memory.

来源：https://stackoverflow.com/questions/45123238/python-class-vs-tuple-huge-memory-overhead

标签

python

list

class

data-structures

tuples

python: class vs tuple huge memory overhead (?)

问题

回答1:

tuple

class Person

namedtuple (tuple + __slots__)

class Person + __slots__