Simple Subquery with OuterRef

牧云@^-^@ 提交于 2019-11-29 21:33:42
Todor

One of the problems with your example is that you cannot use queryset.count() as a subquery, because .count() tries to evaluate the queryset and return the count.

So one may think that the right approach would be to use Count() instead. Maybe something like this:

Post.objects.annotate(
    count=Count(Tag.objects.filter(post=OuterRef('pk')))
)

This wont work for two reasons:

  1. The Tag queryset selects all Tag fields, while Count can only count on one field. Thus: Tag.objects.filter(post=OuterRef('pk')).only('pk') is needed (to select counting on tag.pk).

  2. Count itself is not a Subquery class, Count is an Aggregate. So the expression generated by Count is not recognized as a Subquery, we can fix that by using Subquery.

Apply-ing fixes for 1) and 2) would produce:

Post.objects.annotate(
    count=Count(Subquery(Tag.objects.filter(post=OuterRef('pk')).only('pk')))
)

However if you inspect the query being produced

SELECT 
    "tests_post"."id",
    "tests_post"."title",
    COUNT((SELECT U0."id" 
            FROM "tests_tag" U0 
            INNER JOIN "tests_post_tags" U1 ON (U0."id" = U1."tag_id") 
            WHERE U1."post_id" = ("tests_post"."id"))
    ) AS "count" 
FROM "tests_post" 
GROUP BY 
    "tests_post"."id",
    "tests_post"."title"

You may notice that we have a GROUP BY clause. This is because Count is an Aggregate, right now it does not affect the result, but in some other cases it may. Thats why the docs suggest a little bit different approach, where the aggregation is moved into the subquery via a specific combination of values + annotate + values

Post.objects.annotate(
    count=Subquery(
        Tag.objects.filter(post=OuterRef('pk'))
            # The first .values call defines our GROUP BY clause
            # Its important to have a filtration on every field defined here
            # Otherwise you will have more than one group per row!!!
            # This will lead to subqueries to return more than one row!
            # But they are not allowed to do that!
            # In our example we group only by post
            # and we filter by post via OuterRef
            .values('post')
            # Here we say: count how many rows we have per group 
            .annotate(count=Count('pk'))
            # Here we say: return only the count
            .values('count')
    )
)

Finally this will produce:

SELECT 
    "tests_post"."id",
    "tests_post"."title",
    (SELECT COUNT(U0."id") AS "count" 
            FROM "tests_tag" U0 
            INNER JOIN "tests_post_tags" U1 ON (U0."id" = U1."tag_id") 
            WHERE U1."post_id" = ("tests_post"."id") 
            GROUP BY U1."post_id"
    ) AS "count" 
FROM "tests_post"

The django-sql-utils package makes this kind of subquery aggregation simple. Just pip install django-sql-utils and then:

from sql_util.utils import SubqueryCount
posts = Post.objects.annotate(tag_count=SubqueryCount('tag'))

The API for SubqueryCount is the same as Count, but it generates a subselect in the SQL instead of joining to the related table.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!