Duplicate elements in Django Paginate after `order_by` call

匿名 (未验证) 提交于 2019-12-03 08:36:05

问题:

I'm using Django 1.7.7.

I'm wondering if anyone has experienced this. This is my query:

events = Event.objects.filter(     Q(date__gt=my_date) | Q(date__isnull=True) ).filter(type__in=[...]).order_by('date') 

When I try to then paginate it

p = Paginator(events, 10) p.count  # Gives 91  event_ids = [] for i in xrange(1, p.count / 10 + 2):     event_ids += [i.id for i in p.page(i)]  print len(event_ids)  # Still 91 print len(set(event_ids))  # 75 

I noticed that if I removed the .order_by, I don't get any duplicates. I then tried just .order_by with Event.objects.all().order_by('date') which gave no duplicates.

Finally, I tried this:

events = Event.objects.filter(     Q(date__gt=my_date) | Q(date__isnull=True) ).order_by('date')  p = Paginator(events, 10) events.count()  # Gives 131 p.count  # Gives 131  event_ids = [] for i in xrange(1, p.count / 10 + 2):     event_ids += [i.id for i in p.page(i)]  len(event_ids)  # Gives 131 len(set(event_ids))  # Gives 118 

... and there are duplicates. Can anyone explain what's going on?

I dug into the Django source (https://github.com/django/django/blob/master/django/core/paginator.py#L46-L55) and it seems to be something to do with how Django slices the object_list.

Any help is appreciated. Thanks.

Edit: distinct() has no affect on the duplicates. There aren't any duplicates in the database and I don't think the query introduces any duplicates ([e for e in events.iterator()] doesn't produce any duplicates). It's just when the Paginator is slicing.

Edit2: Here's a more complete example

In [1]: from django.core.paginator import Paginator  In [2]: from datetime import datetime, timedelta  In [3]: my_date = timezone.now()  In [4]:   1 events = Event.objects.filter(           2     Q(date__gt=my_date) | Q(date__isnull=True)           3 ).order_by('date')  In [5]: events.count() Out[5]: 134  In [6]: p = Paginator(events, 10)  In [7]: p.count Out[7]: 134  In [8]: event_ids = []  In [9]:   1 for i in xrange(1, p.num_pages + 1):           2     event_ids += [j.id for j in p.page(i)]  In [10]: len(event_ids) Out[10]: 134  In [11]: len(set(event_ids)) Out[11]: 115 

回答1:

oh, shot in the dark, but i think i might know what it is. i wasn't able to reproduce it in sqlite but using mysql. i think mysql trying to sort on a column that has the same value has it returning the same results during slicing

the pagination splicing basically does an sql statement of SELECT ... FROM ... WHERE (date > D OR date IS NULL) ORDER BY date ASC LIMIT X OFFSET X

But when date is null I'm not sure how mysql sorts it. So when I tried two sql queries of LIMIT 10 and LIMIT 10 OFFSET 10 it returned sets that had the same rows, while LIMIT 20 produce a unique set.

you can try to update your order_by to order_by('id', 'date') to have it sort by a unique field first and it may fix it.



标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!