How are Reddit and Hacker News ranking algorithms used?

两盒软妹~` 提交于 2019-12-02 14:15:34

Reddit uses Pyrex, the sort algorithm is a Python C extension to improve performance.

So, you can do the same in SQL when the record is updated, pex: when is up or down voted.

The pseudocode you must to translate to your SQL engine syntax:

function hot(ups, downs, date){
    score = ups - downs;
    order = log(max(abs(score), 1), 10);
    if (score>0){
        sign = 1;
    } else {
        if (score<0){
            sign = -1;
        } else {
            sign = 0;
        }
    }
    td = date - datetime(1970,1,1);
    seconds = td.days * 86400 + td.seconds + (float(td.microseconds) / 1000000) - 1134028003;

    return round(order + sign * seconds / 45000, 7);
}

So you must to store in the post table the ups, downs, date and the hot function result. And then you can make a sort in the hot column.

You can see the Reddit source code here: http://code.reddit.com/

I implemented an SQL version of Reddit's ranking algorithm for a video aggregator like so:

SELECT id, title
FROM videos
ORDER BY 
    LOG10(ABS(cached_votes_total) + 1) * SIGN(cached_votes_total)   
    + (UNIX_TIMESTAMP(created_at) / 300000) DESC
LIMIT 50

cached_votes_total is updated by a trigger whenever a new vote is cast. It runs fast enough on our current site, but I am planning on adding a ranking value column and updating it with the same trigger as the cached_votes_total column. After that optimization, it should be fast enough for most any size site.

edit: More information at Reddit Hotness Algorithm in SQL

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!