python: class vs tuple huge memory overhead (?)

三世轮回 提交于 2019-12-04 03:19:07

As others have said in their answers, you'll have to generate different objects for the comparison to make sense.

So, let's compare some approaches.

tuple

l = [(i, i) for i in range(10000000)]
# memory taken by Python3: 1.0 GB

class Person

class Person:
    def __init__(self, first, last):
        self.first = first
        self.last = last

l = [Person(i, i) for i in range(10000000)]
# memory: 2.0 GB

namedtuple (tuple + __slots__)

from collections import namedtuple
Person = namedtuple('Person', 'first last')

l = [Person(i, i) for i in range(10000000)]
# memory: 1.1 GB

namedtuple is basically a class that extends tuple and uses __slots__ for all named fields, but it adds fields getters and some other helper methods (you can see the exact code generated if called with verbose=True).

class Person + __slots__

class Person:
    __slots__ = ['first', 'last']
    def __init__(self, first, last):
        self.first = first
        self.last = last

l = [Person(i, i) for i in range(10000000)]
# memory: 0.9 GB

This is a trimmed-down version of namedtuple above. A clear winner, even better than pure tuples.

chepner

Using __slots__ decreases the memory footprint quite a bit (from 1.7 GB to 625 MB in my test), since each instance no longer needs to hold a dict to store the attributes.

class Person:
    __slots__ = ['first', 'last']
    def __init__(self, first, last):
        self.first = first
        self.last = last

The drawback is that you can no longer add attributes to an instance after it is created; the class only provides memory for the attributes listed in the __slots__ attribute.

There is yet another way to reduce the amount of memory occupied by objects by turning off support for cyclic garbage collection in addition to turning off __dict__ and __weakref__. It is implemented in the library recordclass:

$ pip install recordclass

>>> import sys
>>> from recordclass import dataobject, make_dataclass

Create the class:

class Person(dataobject):
   first:str
   last:str

or

>>> Person = make_dataclass('Person', 'first last')

As result:

>>> print(sys.getsizeof(Person(100,100)))
32

For __slot__ based class we have:

class Person:
    __slots__ = ['first', 'last']
    def __init__(self, first, last):
        self.first = first
        self.last = last

>>> print(sys.getsizeof(Person(100,100)))
64

As a result more saving of memory is possible.

For dataobject-based:

l = [Person(i, i) for i in range(10000000)]
memory size: 681 Mb

For __slots__-based:

  l = [Person(i, i) for i in range(10000000)]
  memory size: 921 Mb

In your second example, you only create one object, because tuples are constants.

>>> l = [('foo', 'bar') for i in range(10000000)]
>>> id(l[0])
4330463176
>>> id(l[1])
4330463176

Classes have the overhead, that the attributes are saved in a dictionary. Therefore namedtuples needs only half the memory.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!