Django: Why is Foo.objects.extra(…) So Much Faster Than Foo.objects.raw?

问题

So I am trying to optimize a fairly odd query, but this is a legacy database so I make do with what I have. These are the queries I am trying. They provide the same output at this point. w is my queryset.

def future_schedule(request):

    past = datetime.date.today()-datetime.timedelta(days=730)

    extra_select = {
        'addlcomplete': 'SELECT Complete FROM tblAdditionalDates WHERE Checkin.ShortSampleID = tblAdditionalDates.ShortSampleID',
        'addldate': 'SELECT AddlDate FROM tblAdditionalDates WHERE Checkin.ShortSampleID = tblAdditionalDates.ShortSampleID'
    }
    extra_where = ['''(Checkin.Description <> "Sterilization Permit" AND Checkin.Description <> "Registration State" AND Checkin.Description <> "Miscellaneous" AND Checkin.Description <> "Equipment Purchase" AND Checkin.DateArrived > %s AND Checkin.DateCompleted IS NULL AND Checkin.Canceled = 0) OR (Checkin.Description <> "Sterilization Permit" AND Checkin.Description <> "Registration State" AND Checkin.Description <> "Miscellaneous" AND Checkin.Description <> "Equipment Purchase" AND Checkin.DateArrived > %s AND Checkin.DateCompleted IS NOT NULL AND Checkin.DateFinalCompleted IS NULL AND Checkin.DateFinalExpected IS NOT NULL AND Checkin.Canceled = 0) '''
    ]
    extra_params = [past, past]

    w = Checkin.objects.extra(select=extra_select, where=extra_where, params=extra_params)

# OR This one

    w = Checkin.objects.raw('''SELECT Checkin.SampleID, Checkin.ShortSampleID, Checkin.Company, A.Complete, Checkin.HasDates, A.AddlDate FROM Checkin LEFT JOIN (SELECT ShortSampleID, Complete, AddlDate FROM tblAdditionalDates) A ON A.ShortSampleID = Checkin.ShortSampleID WHERE (Checkin.Description <> "Sterilization Permit" AND Checkin.Description <> "Registration State" AND Checkin.Description <> "Miscellaneous" AND Checkin.Description <> "Equipment Purchase" AND Checkin.DateArrived > "2009-01-01" AND Checkin.DateCompleted IS NULL AND Checkin.Canceled = 0) OR (Checkin.Description <> "Sterilization Permit" AND Checkin.Description <> "Registration State" AND Checkin.Description <> "Miscellaneous" AND Checkin.Description <> "Equipment Purchase" AND Checkin.DateArrived > "2009-01-01" AND Checkin.DateCompleted IS NOT NULL AND Checkin.DateFinalCompleted IS NULL AND Checkin.DateFinalExpected IS NOT NULL AND Checkin.Canceled = 0)''')

Both of these return the same number of records (322). .extra is about 10 seconds faster in rendering the HTML than the .raw query and for all intensive purposes, the .raw query is mildly less complex even. Does anyone have any insight as to why this might be? Based on my structure, .raw may be the only way I get the data I need (I need the addlcomplete and addldate in the extra_select dict and use them in a Having clause to further filter the queryset) but I certainly don't like how long it is taking. Is it on the template layer that it is slower or the actual query layer? How can I best debug this?

Thank for your help in this quest for optimization amidst poor data structures.

UPDATE 1: 2011-10-03

So I installed django-debugtoolbar to snoop around a bit and I eneabled MySQL general logging and came up with the following:

using .filter() or .extra() Total Query count is 2. Using .raw() Total Query count is 1984!!! (Spooky literary reference not ignored)

My template is using a regroup and then looping through that regroup. No relations are being followed, no template tags other than builtins are being used. Select_related is NOT being used and I still only get the 2 queries. Looking at the mysql log, sure enough - 1984 queries.

When looking at the queries that were executed, basically it looks like for every {{ Modelinstance.field }} django was doing a SELECT pk, field FROM Model WHERE Model.pk = Modelinstance.pk This seems completely wrong if you ask me. Am I missing something here or is django really running wild with queries?

END UPDATE 1

UPDATE 2 See answer below

Greg

回答1:

Ok. Here are my final conclusions. While Furbeenator is correct about the internal Django optimizations, turns out there is a much larger, user error that caused the slowdown and the aforementioned thousands of queries.

It is clearly documented in the Raw queryset docs that when you defer fields (i.e. not using SELECT * FROM ...) and are selecting only certain fields specifically (SELECT Checkin.Sampleid, ... the fields that you don't select can still be accessed but with another database call. So, if you are selecting a subset of fields in your raw query and you forgot a field in your query that you use in your template, Django performs a database lookup to find that field you are referencing in your template rather than complaining about it not existing or whatever. So, let's say you leave out 5 fields from your query (which is what I did) that you end up referencing in your template and you have 300 records that you are looping through. This incurs 1500 extra database hits to get those 5 fields for each record.

So, beware of hidden references and thank god for Django Debug Toolbar

回答2:

From the Optimization section: Database access optimization, they suggest ways to optimize, one of which is the extra() method. Then they mention .raw(). It is my assumption that they made raw() much more robust and powerful so it offered maximum flexibility over optimization. Performing raw SQL queries allows you to do a lot more than extra(). My hunch is that it is just geared more toward flexibility than performance and extra() should be used over raw() where possible.

来源：https://stackoverflow.com/questions/7636859/django-why-is-foo-objects-extra-so-much-faster-than-foo-objects-raw

标签

django

django-models

django-templates

query-optimization

django-queryset