Querying data from Django

二次信任 提交于 2019-12-25 14:49:41

问题


Here's what my model structure looks like:

class Visitor(models.Model):
    id = models.AutoField(primary_key=True)

class Session(models.Model):
    id = models.AutoField(primary_key=True)
    visit = models.ForeignKey(Visitor)
    sequence_no = models.IntegerField(null=False)

class Track(models.Model):
    id = models.AutoField(primary_key=True)
    session = models.ForeignKey(Session)
    action = models.ForeignKey(Action)
    when = models.DateTimeField(null=False, auto_now_add=True)
    sequence_no = models.IntegerField(null = False)

class Action(models.Model):
    id = models.AutoField(primary_key=True)
    url = models.CharField(max_length=65535, null=False)
    host = models.IntegerField(null=False)

As you can see, each Visitor has multiple Sessions; each Session has multiple Tracks and each Track has one Action. Tracks are always ordered ascendingly by the session and the sequence_no. A Visitors average time on an site (i.e. a particular Action.host) is the difference in Track.when (time) between the highest and lowest Track.sequence_no divided by the number of Sessions of that Visitor.

I need to calculate the average time of visitors on the site which be the sum of the time for each visitor on the Action.site divided by the number of visitors.

I could query this using SQL but I'd like to keep my query as Djangonic as possible and I'm still very lost with complex queries.


回答1:


For a specific Action object you can gather interesting data about Sessions:

from django.db.models import Min, Max
from yourapp.models import *

host = 1  # I suppose you want to calculate for each site

sessions = list(Session.objects.filter(
    track__action__host=host,
).annotate(
    start=Min('track__when'),
    end=Max('track__when'), 
).values('visit_id', 'start', 'end'))

You will get something in the line of:

[ 
  { 'visit_id': 1, 'start': datetime(...), 'end': datetime(...) },
  { 'visit_id': 1, 'start': datetime(...), 'end': datetime(...) },
  { 'visit_id': 2, 'start': datetime(...), 'end': datetime(...) },
 ....
]

Now it's only a matter of getting the desired result from the data:

number_of_visitors = len(set(s['visit_id'] for s in sessions))
total_time = sum((s['end'] - s['start']).total_seconds() for s in sessions)
average_time_spent = total_time / number_of_visitors

Another way is to use two queries instead of one, and avoid the len(set(...)) snippet:

sessions = Session.objects.filter(
    track__action__host=host,
).annotate(
    start=Min('track__when'),
    end=Max('track__when'), 
)

number_of_visitors = sessions.values('visit_id').distict().count()
total_time = sum((s['end'] - s['start']).total_seconds() 
    for s in sessions.values('start', 'end'))

There is NO WAY to do actual calculated fields barring the provided aggregations, so either you do it in raw SQL or you do in code like this.

At least the proposed solution uses Django's ORM as far as possible.



来源:https://stackoverflow.com/questions/6954746/querying-data-from-django

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!