Deciding and implementing a trending algorithm in Django

五迷三道 提交于 2019-11-29 09:36:44

Probably the simplest possible trending "algorithm" I can think of is the n-day moving average. I'm not sure how your data is structured, but say you have something like this:

books = {'Twilight': [500, 555, 580, 577, 523, 533, 556, 593],
         'Harry Potter': [650, 647, 653, 642, 633, 621, 625, 613],
         'Structure and Interpretation of Computer Programs': [1, 4, 15, 12, 7, 3, 8, 19]
        }

A simple moving average just takes the last n values and averages them:

def moving_av(l, n):
    """Take a list, l, and return the average of its last n elements.
    """
    observations = len(l[-n:])
    return sum(l[-n:]) / float(observations)

The slice notation simply grabs the tail end of the list, starting from the nth to last variable. A moving average is a fairly standard way to smooth out any noise that a single spike or dip could introduce. The function could be used like so:

book_scores = {}
for book, reader_list in books.iteritems():
    book_scores[book] = moving_av(reader_list, 5)

You'll want to play around with the number of days you average over. And if you want to emphasize recent trends you can also look at using something like a weighted moving average.

If you wanted to focus on something that looks less at absolute readership and focuses instead on increases in readership, simply find the percent change in the 30-day moving average and 5-day moving average:

d5_moving_av = moving_av(reader_list, 5)
d30_moving_av = moving_av(reader_list, 30)
book_score = (d5_moving_av - d30_moving_av) / d30_moving_av

With these simple tools you have a fair amount of flexibility in how much you emphasize past trends and how much you want to smooth out (or not smooth out) spikes.

Popularity is easy; you just run a count on the readers and order by that:

Book.objects.annotate(reader_count=Count('readers')).order_by('-reader_count')

Trending is more difficult as this is more a popularity delta, i.e. which books have gains the most readers recently. If you want something like this, you'll need something running behind the scenes to keep a record of reader counts by date.

dani herrera

You can take stackoverflow reputation ranking as example.

User can change view: by month, by year, ....

In your case: The most read book by month, by year.

To achieve this you should save day by day the number of readers for each book.

reader( date, book, total )

Then it is as simple as:

   Book.objects.filter(  
                   boor__reader__date__gte = some_date
                      ).annotate(
                            num_readers=Sum('book__reader__total')
                                ).order_by('-num_readers')

I would do it systemically like this:

  1. Make a list of the most common questions or data points a user will be interested in, for example: 1.1 Top 100 Most popular books this week 1.2 Top 100 Most popular books this month

  2. After your daily reader/book info. is updated, I would run a job (probably nightly) to update a table of this info. Table will probably have Book and ReaderDelta fields where ReaderDelta is the change in readerCount over a week, month or year.

  3. You could also simply store the daily ReaderDelta and when looking for a month's worth of data, simply aggregate the past 30 days by date dynamically.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!