Chaining multiple filter() in Django, is this a bug?

柔情痞子 提交于 2019-11-26 06:15:32

问题


I always assumed that chaining multiple filter() calls in Django was always the same as collecting them in a single call.

# Equivalent
Model.objects.filter(foo=1).filter(bar=2)
Model.objects.filter(foo=1,bar=2)

but I have run across a complicated queryset in my code where this is not the case

class Inventory(models.Model):
    book = models.ForeignKey(Book)

class Profile(models.Model):
    user = models.OneToOneField(auth.models.User)
    vacation = models.BooleanField()
    country = models.CharField(max_length=30)

# Not Equivalent!
Book.objects.filter(inventory__user__profile__vacation=False).filter(inventory__user__profile__country=\'BR\')
Book.objects.filter(inventory__user__profile__vacation=False, inventory__user__profile__country=\'BR\')

The generated SQL is

SELECT \"library_book\".\"id\", \"library_book\".\"asin\", \"library_book\".\"added\", \"library_book\".\"updated\" FROM \"library_book\" INNER JOIN \"library_inventory\" ON (\"library_book\".\"id\" = \"library_inventory\".\"book_id\") INNER JOIN \"auth_user\" ON (\"library_inventory\".\"user_id\" = \"auth_user\".\"id\") INNER JOIN \"library_profile\" ON (\"auth_user\".\"id\" = \"library_profile\".\"user_id\") INNER JOIN \"library_inventory\" T5 ON (\"library_book\".\"id\" = T5.\"book_id\") INNER JOIN \"auth_user\" T6 ON (T5.\"user_id\" = T6.\"id\") INNER JOIN \"library_profile\" T7 ON (T6.\"id\" = T7.\"user_id\") WHERE (\"library_profile\".\"vacation\" = False  AND T7.\"country\" = BR )
SELECT \"library_book\".\"id\", \"library_book\".\"asin\", \"library_book\".\"added\", \"library_book\".\"updated\" FROM \"library_book\" INNER JOIN \"library_inventory\" ON (\"library_book\".\"id\" = \"library_inventory\".\"book_id\") INNER JOIN \"auth_user\" ON (\"library_inventory\".\"user_id\" = \"auth_user\".\"id\") INNER JOIN \"library_profile\" ON (\"auth_user\".\"id\" = \"library_profile\".\"user_id\") WHERE (\"library_profile\".\"vacation\" = False  AND \"library_profile\".\"country\" = BR )

The first queryset with the chained filter() calls joins the Inventory model twice effectively creating an OR between the two conditions whereas the second queryset ANDs the two conditions together. I was expecting that the first query would also AND the two conditions. Is this the expected behavior or is this a bug in Django?

The answer to a related question Is there a downside to using ".filter().filter().filter()..." in Django? seems to indicated that the two querysets should be equivalent.


回答1:


The way I understand it is that they are subtly different by design (and I am certainly open for correction): filter(A, B) will first filter according to A and then subfilter according to B, while filter(A).filter(B) will return a row that matches A 'and' a potentially different row that matches B.

Look at the example here:

https://docs.djangoproject.com/en/dev/topics/db/queries/#spanning-multi-valued-relationships

particularly:

Everything inside a single filter() call is applied simultaneously to filter out items matching all those requirements. Successive filter() calls further restrict the set of objects

...

In this second example (filter(A).filter(B)), the first filter restricted the queryset to (A). The second filter restricted the set of blogs further to those that are also (B). The entries select by the second filter may or may not be the same as the entries in the first filter.`




回答2:


These two style of filtering are equivalent in most cases, but when query on objects base on ForeignKey or ManyToManyField, they are slightly different.

Examples from the documentation.

model
Blog to Entry is a one-to-many relation.

from django.db import models

class Blog(models.Model):
    ...

class Entry(models.Model):
    blog = models.ForeignKey(Blog)
    headline = models.CharField(max_length=255)
    pub_date = models.DateField()
    ...

objects
Assuming there are some blog and entry objects here.

queries

Blog.objects.filter(entry__headline_contains='Lennon', 
    entry__pub_date__year=2008)
Blog.objects.filter(entry__headline_contains='Lennon').filter(
    entry__pub_date__year=2008)  

For the 1st query (single filter one), it match only blog1.

For the 2nd query (chained filters one), it filters out blog1 and blog2.
The first filter restricts the queryset to blog1, blog2 and blog5; the second filter restricts the set of blogs further to blog1 and blog2.

And you should realize that

We are filtering the Blog items with each filter statement, not the Entry items.

So, it's not the same, because Blog and Entry are multi-valued relationships.

Reference: https://docs.djangoproject.com/en/1.8/topics/db/queries/#spanning-multi-valued-relationships
If there is something wrong, please correct me.

Edit: Changed v1.6 to v1.8 since the 1.6 links are no longer available.




回答3:


As you can see in the generated SQL statements the difference is not the "OR" as some may suspect. It is how the WHERE and JOIN is placed.

Example1 (same joined table) :

(example from https://docs.djangoproject.com/en/dev/topics/db/queries/#spanning-multi-valued-relationships)

Blog.objects.filter(entry__headline__contains='Lennon', entry__pub_date__year=2008)

This will give you all the Blogs that have one entry with both (entry_headline_contains='Lennon') AND (entry__pub_date__year=2008), which is what you would expect from this query. Result: Book with {entry.headline: 'Life of Lennon', entry.pub_date: '2008'}

Example 2 (chained)

Blog.objects.filter(entry__headline__contains='Lennon').filter(entry__pub_date__year=2008)

This will cover all the results from Example 1, but it will generate slightly more result. Because it first filters all the blogs with (entry_headline_contains='Lennon') and then from the result filters (entry__pub_date__year=2008).

The difference is that it will also give you results like: Book with {entry.headline: 'Lennon', entry.pub_date: 2000}, {entry.headline: 'Bill', entry.pub_date: 2008}

In your case

I think it is this one you need:

Book.objects.filter(inventory__user__profile__vacation=False, inventory__user__profile__country='BR')

And if you want to use OR please read: https://docs.djangoproject.com/en/dev/topics/db/queries/#complex-lookups-with-q-objects




回答4:


Sometimes you don't want to join multiple filters together like this:

def your_dynamic_query_generator(self, event: Event):
    qs \
    .filter(shiftregistrations__event=event) \
    .filter(shiftregistrations__shifts=False)

And the following code would actually not return the correct thing.

def your_dynamic_query_generator(self, event: Event):
    return Q(shiftregistrations__event=event) & Q(shiftregistrations__shifts=False)

What you can do now is to use an annotation count-filter.

In this case we count all shifts which belongs to a certain event.

qs: EventQuerySet = qs.annotate(
    num_shifts=Count('shiftregistrations__shifts', filter=Q(shiftregistrations__event=event))
)

Afterwards you can filter by annotation.

def your_dynamic_query_generator(self):
    return Q(num_shifts=0)

This solution is also cheaper on large querysets.

Hope this helps.



来源:https://stackoverflow.com/questions/8164675/chaining-multiple-filter-in-django-is-this-a-bug

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!