问题
To keep it simple I have four tables(A, B, Category and Relation), Relation table stores the Intensity
of A in B and Category stores the type of B.
A <--- Relation ---> B ---> Category
(So the relation between A and B is n to n, when the relation between B and Category is n to 1)
I need an ORM to group Relation records by Category and A, then calculate Sum
of Intensity
in each (Category, A) (seems simple till here), then I want to annotate Max of calculated Sum
in each Category.
My code is something like:
A.objects.values('B_id').annotate(AcSum=Sum(Intensity)).annotate(Max(AcSum))
Which throws the error:
django.core.exceptions.FieldError: Cannot compute Max('AcSum'): 'AcSum' is an aggregate
Django-group-by package with the same error.
For further information please also see this stackoverflow question.
I am using Django 2 and PostgreSQL.
Is there a way to achieve this using ORM, if there is not, what would be the solution using raw SQL expression?
Update
After lots of struggling I found out that what I wrote was indeed an aggregation, however what I want is to find out the maximum of AcSum of each A in each category. So I suppose I have to group-by the result once more after AcSum Calculation. Based on this insight I found a stack-overflow question which asks the same concept(The question was asked 1 year, 2 months ago without any accepted answer). Chaining another values('id') to the set does not function neither as a group_by nor as a filter for output attributes, It removes AcSum from the set. Adding AcSum to values() is also not an option due to changes in the grouped by result set. I think what I am trying to do is re grouping the grouped by query based on the fields inside a column (i.e id). any thoughts?
回答1:
You can't do an aggregate of an aggregate Max(Sum())
, it's not valid in SQL, whether you're using the ORM or not. Instead, you have to join the table to itself to find the maximum. You can do this using a subquery. The below code looks right to me, but keep in mind I don't have something to run this on, so it might not be perfect.
from django.db.models import Subquery, OuterRef
annotation = {
'AcSum': Sum('intensity')
}
# The basic query is on Relation grouped by A and Category, annotated
# with the Sum of intensity
query = Relation.objects.values('a', 'b__category').annotate(**annotation)
# The subquery is joined to the outerquery on the Category
sub_filter = Q(b__category=OuterRef('b__category'))
# The subquery is grouped by A and Category and annotated with the Sum
# of intensity, which is then ordered descending so that when a LIMIT 1
# is applied, you get the Max.
subquery = Relation.objects.filter(sub_filter).values('a', 'b__category').annotate(**annotation).order_by('-AcSum').values('AcSum')[:1]
query = query.annotate(max_intensity=Subquery(subquery))
This should generate SQL like:
SELECT a_id, category_id,
(SELECT SUM(U0.intensity) AS AcSum
FROM RELATION U0
JOIN B U1 on U0.b_id = U1.id
WHERE U1.category_id = B.category_id
GROUP BY U0.a_id, U1.category_id
ORDER BY SUM(U0.intensity) DESC
LIMIT 1
) AS max_intensity
FROM Relation
JOIN B on Relation.b_id = B.id
GROUP BY Relation.a_id, B.category_id
It may be more performant to eliminate the join in Subquery by using a backend specific feature like array_agg (Postgres) or GroupConcat (MySQL) to collect the Relation.ids that are grouped together in the outer query. But I don't know what backend you're using.
回答2:
Something like this should work for you. I couldn't test it myself, so please let me know the result:
Relation.objects.annotate(
b_category=F('B__Category')
).values(
'A', 'b_category'
).annotate(
SumInensityPerCategory=Sum('Intensity')
).values(
'A', MaxIntensitySumPerCategory=Max('SumInensityPerCategory')
)
来源:https://stackoverflow.com/questions/48226944/calculate-max-of-sum-of-an-annotated-field-over-a-grouped-by-query-in-django-orm