问题
Pre-requisites:
- Queryset must return
Article
s - Queryset must return unique objects
- Must not utilize for loops that hit the database (meaning N queries for N objects to annotate)
my models:
class Report(BaseModel):
ios_report = JSONField()
android_report = JSONField()
class Article(BaseModel):
internal_id = models.IntegerField(unique=True)
title = models.CharField(max_length=500)
short_title = models.CharField(max_length=500)
picture_url = models.URLField()
published_date = models.DateField()
clip_link = models.URLField()
reports = models.ManyToManyField(
"Report", through="ArticleInReport", related_name="articles"
)
class ArticleInReport(BaseModel):
article = models.ForeignKey("core.Article", on_delete=models.CASCADE, related_name='articleinreports')
report = models.ForeignKey("core.Report", on_delete=models.CASCADE, related_name='articleinreports')
ios_views = models.IntegerField()
android_views = models.IntegerField()
@property
def total_views(self):
return self.ios_views + self.android_views
Everything starts with a Report
object that is created at set intervals. This report contains data about articles and their respective views. A Report
will have a relationship with an Article
through ArticleInReport
, which holds the total number of users in Article
at the time the report was imported.
In my view, I need to display the following information:
- All articles that received views in the last 30 minutes.
- With each article annotated with the following information, and this is where I'm facing a problem:
If present, the number of views the
Article
object had in the lastReport
. If not present, 0.
my views.py
file:
reports_in_time_range = Report.objects.filter(created_date__range=[starting_range, right_now]).order_by('created_date')
last_report = reports_in_time_range.prefetch_related('articles').last()
unique_articles = Article.objects.filter(articleinreports__report__in=reports_in_time_range).distinct('id')
articles = Article.objects.filter(id__in=unique_articles).distinct('id').annotate(
total_views=Case(
When(id__in=last_report.articles.values_list('id', flat=True),
then=F('articleinreports__ios_views') + F('articleinreports__android_views')),
default=0, output_field=IntegerField(),
))
Some explanation for my thought process: first, get me only articles that appear in the relevant reports in the time range (filter(id__in=unique_articles)
), return only distinct articles. Next, if the article's ID appears in the last report's list of articles (through ArticleInReport
of course), calculate iOS views + Android views for that ArticleInReport
.
This above annotation is working for most Article
s, but failing miserably for others for no apparent reason. I've tried many different approaches but seem to always get the wrong results.
回答1:
It's very important to avoid hits to database, but not at this price. In my opinion you should to split your query in two or more queries. Splitting the query you will improve in readability and also, may be, in performance (sometimes two simple queries runs faster than a complex one) Remember you have all the power of dics, comprehension and itertools to massage your partials results.
reports_in_time_range = ( Report
.objects
.filter(created_date__range=[starting_range, right_now])
.order_by('created_date'))
last_report = reports_in_time_range.prefetch_related('articles').last()
report_articles_ids = ( Article
.objects
.filter(articleinreports__report=last_report)
.values_list('id', flat=True)
.distinct())
report_articles = ( Article
.objects
.filter(id__in=report_articles_ids)
.annotate( total_views=Sum(
F('articleinreports__ios_views') +
F('articleinreports__android_views'),
output_field=IntegerField()
)))
other_articles = ( Article
.objects
.exclude(id__in=report_articles_ids)
.annotate( total_views=ExpressionWrapper(
Value(0),
output_field=IntegerField())
)))
articles = report_articles | other_articles
回答2:
I can see the problem with then=F('articleinreports__ios_views') + F('articleinreports__android_views')
, because it has no idea which ArticleInReport to use.... So it will probably create duplicates per each ArticleInReport associated with each Article. As @daniherrera suggests, you can firstly get all Articles you need, then fetch all ArticleInReport from last report, that would be 3 queries. Then you can just loop through Articles and if you have ArticleInReport for Article, assign views count, if no - assign zero. This will work if you don't need any further sql operations with total_views
. You would probably want to build a dictionary of {Article.id: ArticleInReport} before loop for easy lookup.
Another approach (if you need some filtering or sorting or whatever) is to use Subquery
of ArticleInReport from last report to add total_views
annotation for Article queryset. You can then use Coalesce
operator to replace Null with zero when Article received no views in last Report.
P. S. I think that prefetch_related('articles')
is useless, because you use values_list anyway.
P. P. S also you don't need distinct for unique_articles and articles, because __in lookup will already produce distinct result
回答3:
The problem with your approach you need to match the only exact id's using IN
will return a larger than expected bound and you can using the reverse name directly to filter the article objects, also the excessive use of unique
articles_with_views_in_range = (
Article.objects
.annotate(
total_views=Case(
When(articleinreports__range=(start_range, end_range),
then=F('articleinreports__ios_views') + F('articleinreports__android_views')),
default=0, output_field=IntegerField(),
)
).filter(total_views__gt=0)
)
来源:https://stackoverflow.com/questions/52689949/django-complex-annotation