SqlAlchemy optimizations for read-only object models

问题

I have a complex network of objects being spawned from a sqlite database using sqlalchemy ORM mappings. I have quite a few deeply nested:

for parent in owner.collection: 
    for child in parent.collection: 
        for foo in child.collection: 
            do lots of calcs with foo.property

My profiling is showing me that the sqlalchemy instrumentation is taking a lot of time in this use case.

The thing is: I don't ever change the object model (mapped properties) at runtime, so once they are loaded I don't NEED the instrumentation, or indeed any sqlalchemy overhead at all. After much research, I'm thinking I might have to clone a 'pure python' set of objects from my already loaded 'instrumented objects', but that would be a pain.

Performance is really crucial here (it's a simulator), so maybe writing those layers as C extensions using sqlite api directly would be best. Any thoughts?

回答1:

If you reference a single attribute of a single instance lots of times, a simple trick is to store it in a local variable.

If you want a way to create cheap pure python clones, share the dict object with the original object:

class CheapClone(object):
    def __init__(self, original):
        self.__dict__ = original.__dict__

Creating a copy like this costs about half of the instrumented attribute access and attribute lookups are as fast as normal.

There might also be a way to have the mapper create instances of an uninstrumented class instead of the instrumented one. If I have some time, I might take a look how deeply ingrained is the assumption that populated instances are of the same type as the instrumented class.

Found a quick and dirty way that seems to at least somewhat work on 0.5.8 and 0.6. Didn't test it with inheritance or other features that might interact badly. Also, this touches some non-public API's, so beware of breakage when changing versions.

from sqlalchemy.orm.attributes import ClassManager, instrumentation_registry

class ReadonlyClassManager(ClassManager):
    """Enables configuring a mapper to return instances of uninstrumented 
    classes instead. To use add a readonly_type attribute referencing the
    desired class to use instead of the instrumented one."""
    def __init__(self, class_):
        ClassManager.__init__(self, class_)
        self.readonly_version = getattr(class_, 'readonly_type', None)
        if self.readonly_version:
            # default instantiation logic doesn't know to install finders
            # for our alternate class
            instrumentation_registry._dict_finders[self.readonly_version] = self.dict_getter()
            instrumentation_registry._state_finders[self.readonly_version] = self.state_getter()

    def new_instance(self, state=None):
        if self.readonly_version:
            instance = self.readonly_version.__new__(self.readonly_version)
            self.setup_instance(instance, state)
            return instance
        return ClassManager.new_instance(self, state)

Base = declarative_base()
Base.__sa_instrumentation_manager__ = ReadonlyClassManager

Usage example:

class ReadonlyFoo(object):
    pass

class Foo(Base, ReadonlyFoo):
    __tablename__ = 'foo'
    id = Column(Integer, primary_key=True)
    name = Column(String(32))

    readonly_type = ReadonlyFoo

assert type(session.query(Foo).first()) is ReadonlyFoo