Django complex annotation

扶醉桌前 提交于 2019-12-20 03:00:10

问题


Pre-requisites:

  • Queryset must return Articles
  • Queryset must return unique objects
  • Must not utilize for loops that hit the database (meaning N queries for N objects to annotate)

my models:

class Report(BaseModel):
    ios_report = JSONField()
    android_report = JSONField()

class Article(BaseModel):

    internal_id = models.IntegerField(unique=True)
    title = models.CharField(max_length=500)
    short_title = models.CharField(max_length=500)
    picture_url = models.URLField()
    published_date = models.DateField()
    clip_link = models.URLField()
    reports = models.ManyToManyField(
        "Report", through="ArticleInReport", related_name="articles"
    )

class ArticleInReport(BaseModel):

    article = models.ForeignKey("core.Article", on_delete=models.CASCADE, related_name='articleinreports')
    report = models.ForeignKey("core.Report", on_delete=models.CASCADE, related_name='articleinreports')
    ios_views = models.IntegerField()
    android_views = models.IntegerField()


    @property
    def total_views(self):
        return self.ios_views + self.android_views

Everything starts with a Report object that is created at set intervals. This report contains data about articles and their respective views. A Report will have a relationship with an Article through ArticleInReport, which holds the total number of users in Article at the time the report was imported.

In my view, I need to display the following information:

  • All articles that received views in the last 30 minutes.
  • With each article annotated with the following information, and this is where I'm facing a problem:

If present, the number of views the Article object had in the last Report. If not present, 0.

my views.py file:

reports_in_time_range = Report.objects.filter(created_date__range=[starting_range, right_now]).order_by('created_date')

last_report = reports_in_time_range.prefetch_related('articles').last()
unique_articles = Article.objects.filter(articleinreports__report__in=reports_in_time_range).distinct('id')

articles = Article.objects.filter(id__in=unique_articles).distinct('id').annotate(
    total_views=Case(
            When(id__in=last_report.articles.values_list('id', flat=True),
                 then=F('articleinreports__ios_views') + F('articleinreports__android_views')),
            default=0, output_field=IntegerField(),
    ))

Some explanation for my thought process: first, get me only articles that appear in the relevant reports in the time range (filter(id__in=unique_articles)), return only distinct articles. Next, if the article's ID appears in the last report's list of articles (through ArticleInReport of course), calculate iOS views + Android views for that ArticleInReport.

This above annotation is working for most Articles, but failing miserably for others for no apparent reason. I've tried many different approaches but seem to always get the wrong results.


回答1:


It's very important to avoid hits to database, but not at this price. In my opinion you should to split your query in two or more queries. Splitting the query you will improve in readability and also, may be, in performance (sometimes two simple queries runs faster than a complex one) Remember you have all the power of dics, comprehension and itertools to massage your partials results.

reports_in_time_range = ( Report
                         .objects
                         .filter(created_date__range=[starting_range, right_now])
                         .order_by('created_date'))

last_report = reports_in_time_range.prefetch_related('articles').last()

report_articles_ids = ( Article
                       .objects
                       .filter(articleinreports__report=last_report)
                       .values_list('id', flat=True)
                       .distinct())

report_articles = ( Article
                   .objects
                   .filter(id__in=report_articles_ids)
                   .annotate( total_views=Sum(  
                                   F('articleinreports__ios_views') +
                                   F('articleinreports__android_views'),
                                   output_field=IntegerField()
                   )))

other_articles = ( Article
                   .objects
                   .exclude(id__in=report_articles_ids)
                   .annotate( total_views=ExpressionWrapper(
                                    Value(0),
                                    output_field=IntegerField())
                   )))

articles = report_articles | other_articles



回答2:


I can see the problem with then=F('articleinreports__ios_views') + F('articleinreports__android_views'), because it has no idea which ArticleInReport to use.... So it will probably create duplicates per each ArticleInReport associated with each Article. As @daniherrera suggests, you can firstly get all Articles you need, then fetch all ArticleInReport from last report, that would be 3 queries. Then you can just loop through Articles and if you have ArticleInReport for Article, assign views count, if no - assign zero. This will work if you don't need any further sql operations with total_views. You would probably want to build a dictionary of {Article.id: ArticleInReport} before loop for easy lookup.

Another approach (if you need some filtering or sorting or whatever) is to use Subquery of ArticleInReport from last report to add total_views annotation for Article queryset. You can then use Coalesce operator to replace Null with zero when Article received no views in last Report.

P. S. I think that prefetch_related('articles') is useless, because you use values_list anyway. P. P. S also you don't need distinct for unique_articles and articles, because __in lookup will already produce distinct result




回答3:


The problem with your approach you need to match the only exact id's using IN will return a larger than expected bound and you can using the reverse name directly to filter the article objects, also the excessive use of unique

articles_with_views_in_range = (
    Article.objects
        .annotate(
              total_views=Case(
                  When(articleinreports__range=(start_range, end_range), 
                       then=F('articleinreports__ios_views') + F('articleinreports__android_views')),
                  default=0, output_field=IntegerField(),
              )
        ).filter(total_views__gt=0)
  )


来源:https://stackoverflow.com/questions/52689949/django-complex-annotation

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!