Django: Duplicated logic between properties and queryset annotations

后端 未结 5 961
我寻月下人不归
我寻月下人不归 2020-12-31 10:48

When I want to define my business logic, I\'m struggling finding the right way to do this, because I often both need a property AND a custom queryset to get the same info. I

相关标签:
5条回答
  • 2020-12-31 11:28

    To avoid any duplication, one option could be:

    • remove the property in the Model
    • use a custom Manager
    • override it's get_queryset() method:
    class PickupTimeSlotManager(models.Manager):
    
        def get_queryset(self):
            return super().get_queryset().annotate(
                db_nb_bookings=Count(
                    'order', filter=Q(order__status=Order.VALIDATED)
                )
            )
    
    from django.db import models
    from .managers import PickupTimeSlotManager
    
    class PickupTimeSlot(models.Model):
        ...
        # Add custom manager
        objects = PickupTimeSlotManager()
    

    advantage: the calculated properties is transparently added to any queryset; no further action is required to use it

    disadvantage: the computational overhead occurs even when the calculated property is not used

    0 讨论(0)
  • 2020-12-31 11:29

    Based on your different good answers, I decided to stick with annotations and properties. I created a cache mechanism to make it transparent about the naming. The main advantage is to keep the business logic in one place only. The only drawback I see is that an object could be called from database a second time to be annotated. Performance impact stays minor IMO.

    Here is a full example with 3 different attributes I need in my model. Feel free to comment to improve this.

    models.py

    class PickupTimeSlotQuerySet(query.QuerySet):
    
        def add_booking_data(self):
            return self \
                .prefetch_related('order_set') \
                .annotate(_nb_bookings=Count('order', filter=Q(order__status=Order.VALIDATED))) \
                .annotate(_nb_available_bookings=F('nb_max_bookings') - F('_nb_bookings')) \
                .annotate(_is_bookable=Case(When(_nb_bookings__lt=F('nb_max_bookings'),
                                                 then=Value(True)),
                                            default=Value(False),
                                            output_field=BooleanField())
                          ) \
                .order_by('start')
    
    class PickupTimeSlot(models.Model):
        objects = SafeDeleteManager.from_queryset(PickupTimeSlotQuerySet)()
       
        nb_max_bookings = models.PositiveSmallIntegerField()
        
        @annotate_to_property('add_booking_data', 'nb_bookings')
        def nb_bookings(self):
            pass
        
        @annotate_to_property('add_booking_data', 'nb_available_bookings')
        def nb_available_bookings(self):
            pass
        
        @annotate_to_property('add_booking_data', 'is_bookable')
        def is_bookable(self):
            pass
    

    decorators.py

    def annotate_to_property(queryset_method_name, key_name):
        """
        allow an annotated attribute to be used as property.
        """
        from django.apps import apps
    
        def decorator(func):
            def inner(self):
                attr = "_" + key_name
                if not hasattr(self, attr):
                    klass = apps.get_model(self._meta.app_label,
                                           self._meta.object_name)
                    to_eval = f"klass.objects.{queryset_method_name}().get(pk={self.pk}).{attr}"
                    value = eval(to_eval, {'klass': klass})
                    setattr(self, attr, value)
    
                return getattr(self, attr)
    
            return property(inner)
    
        return decorator
    
    0 讨论(0)
  • 2020-12-31 11:30

    I don't think there is a silver bullet here. But I use this pattern in my projects for such cases.

    class PickupTimeSlotAnnotatedManager(models.Manager):
        def with_nb_bookings(self):
            return self.annotate(
                _nb_bookings=Count(
                    'order', filter=Q(order__status=Order.VALIDATED)
                )
            )
    
    class PickupTimeSlot(models.Model):
        ...
        annotated = PickupTimeSlotAnnotatedManager()
    
        @property
        def nb_bookings(self) -> int:
            """ How many times this time slot is booked? """ 
            if hasattr(self, '_nb_bookings'):
                return self._nb_bookings
            return self.order_set.validated().count()
    

    In code

    qs = PickupTimeSlot.annotated.with_nb_bookings()
    for item in qs:
        print(item.nb_bookings)
    

    This way I can always use property, if it is part of annotated queryset it will use annotated value if not it will calculate it. This approach guaranties that I will have full control of when to make queryset "heavier" by annotating it with required values. If I don't need this I just use regular PickupTimeSlot.objects. ...

    Also if there are many such properties you could write decorator that will wrap property and simplify code. It will work as cached_property decorator, but instead it will use annotated value if it is present.

    0 讨论(0)
  • 2020-12-31 11:37

    Let this be the alternative way to archive what you want:

    Since I usually add the prefetch_related every time I write a queryset. So when I face this problem, I will use Python to solve this problem.

    I'm going to use Python to loop and count the data for me instead of doing it in SQL way.

    class PickupTimeSlot(models.Model):
    
        @property
        def nb_bookings(self) -> int:
            """ How many times this time slot is booked? """ 
            orders = self.order_set.all()  # this won't hit the database if you already did the prefetch_related
            validated_orders = filter(lambda x: x.status == Order.VALIDATED, orders)
            return len(validated_orders)
    

    And most important thing, prefetch_related:

    time_slots = PickupTimeSlot.objects.prefetch_related('order_set').all()
    

    You may have a question that why I didn't prefetch_related with filtered queryset so Python doesn't need to filter again like:

    time_slots = PickupTimeSlot.objects.prefetch_related(
        Prefetch('order_set', queryset=Order.objects.filter(status=Order.VALIDATED))
    ).all()
    

    The answer is there are sometimes that we also need the other information from orders as well. Doing the first way will not cost anything more if we're going to prefetch it anyway.

    Hope this more or less helps you. Have a nice day!

    0 讨论(0)
  • 2020-12-31 11:43

    TL;DR

    • Do you need to filter the "annotated field" results?

      • If Yes, "Keep" the manager and use it when required. In any other situation, use property logic
      • If No, remove the manager/annotation process and stick with property implementation, unless your table is small (~1000 entries) and not growing over the period.
    • The only advantage of annotation process I am seeing here is the filtering capability on the database level of the data


    I have conducted a few tests to reach the conclusion, here they are

    Environment

    • Django 3.0.7
    • Python 3.8
    • PostgreSQL 10.14

    Model Structure

    For the sake of simplicity and simulation, I am following the below model representation

    class ReporterManager(models.Manager):
        def article_count_qs(self):
            return self.get_queryset().annotate(
                annotate_article_count=models.Count('articles__id', distinct=True))
    
    
    class Reporter(models.Model):
        objects = models.Manager()
        counter_manager = ReporterManager()
        name = models.CharField(max_length=30)
    
        @property
        def article_count(self):
            return self.articles.distinct().count()
    
        def __str__(self):
            return self.name
    
    
    class Article(models.Model):
        headline = models.CharField(max_length=100)
        reporter = models.ForeignKey(Reporter, on_delete=models.CASCADE,
                                     related_name="articles")
    
        def __str__(self):
            return self.headline

    I have populated my database, both Reporter and Article model with random strings.

    • Reporter rows ~220K (220514)
    • Article rows ~1M (997311)

    Test Cases

    1. Random picking of Reporter instance and retrieves the article count. We usually do this in the Detail View
    2. A paginated result. We slice the queryset and iterates over the sliced queryset.
    3. Filtering

    I am using the %timeit-(ipython doc) command of Ipython shell to calculate the execution time

    Test Case 1

    For this, I have created these functions, which randomly pick instances from the database

    import random
    
    MAX_REPORTER = 220514
    
    
    def test_manager_random_picking():
        pos = random.randint(1, MAX_REPORTER)
        return Reporter.counter_manager.article_count_qs()[pos].annotate_article_count
    
    
    def test_property_random_picking():
        pos = random.randint(1, MAX_REPORTER)
        return Reporter.objects.all()[pos].article_count

    Results

    In [2]: %timeit test_manager_random_picking()
    8.78 s ± 6.1 s per loop (mean ± std. dev. of 7 runs, 1 loop each)
    
    In [3]: %timeit test_property_random_picking()
    6.36 ms ± 221 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

    Test Case 2

    I have created another two functions,

    import random
    
    PAGINATE_SIZE = 50
    
    
    def test_manager_paginate_iteration():
        start = random.randint(1, MAX_REPORTER - PAGINATE_SIZE)
        end = start + PAGINATE_SIZE
        qs = Reporter.counter_manager.article_count_qs()[start:end]
        for reporter in qs:
            reporter.annotate_article_count
    
    
    def test_property_paginate_iteration():
        start = random.randint(1, MAX_REPORTER - PAGINATE_SIZE)
        end = start + PAGINATE_SIZE
        qs = Reporter.objects.all()[start:end]
        for reporter in qs:
            reporter.article_count

    Results

    In [8]: %timeit test_manager_paginate_iteration()
    4.99 s ± 312 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
    
    In [9]: %timeit test_property_paginate_iteration()
    47 ms ± 1.16 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

    Test Case 3

    undoubtedly, annotation is the only way here

    Here you can see, the annotation process takes a huge amount of time as compared to the property implementation.

    0 讨论(0)
提交回复
热议问题