Hot content algorithm / score with time decay

前端 未结 1 1249
离开以前
离开以前 2020-12-12 10:08

I have been reading + researching on algorithms and formulas to work out a score for my user submitted content to display currently hot / trending items higher up the list,

1条回答
  •  轻奢々
    轻奢々 (楼主)
    2020-12-12 11:10

    Reddits old formula and a little drop off

    Basically you can use Reddit's formula. Since your system only supports upvotes you could weight them, resulting in something like this:

    def hotness(track)
        s = track.playedCount
        s = s + 2*track.downloadCount
        s = s + 3*track.likeCount
        s = s + 4*track.favCount
        baseScore = log(max(s,1))
    
        timeDiff = (now - track.uploaded).toWeeks
    
        if(timeDiff > 1)
            x = timeDiff - 1
            baseScore = baseScore * exp(-8*x*x)
    
        return baseScore
    

    The factor exp(-8*x*x) will give you your desired drop off:

    Exponential drop off

    The basics behind

    You can use any function that goes to zero faster than your score goes up. Since we use log on our score, even a linear function can get multiplied (as long as your score doesn't grow exponentially).

    So all you need is a function that returns 1 as long as you don't want to modify the score, and drops afterwards. Our example above forms that function:

    multiplier(x) = x > 1 ? exp(-8*x*x) : 1
    

    You can vary the multiplier if you want less steep curves.

    Example in C++

    Lets say that the probability for a given track to be played in a given hour is 50%, download 10%, like 1% and favorite 0.1%. Then the following C++ program will give you an estimate for your scores behavior:

    #include 
    #include 
    #include 
    #include 
    #include 
    
    struct track{
        track() : uploadTime(0),playCount(0),downCount(0),likeCount(0),faveCount(0){}
        std::time_t uploadTime;    
        unsigned int playCount;
        unsigned int downCount;
        unsigned int likeCount;
        unsigned int faveCount;    
        void addPlay(unsigned int n = 1){ playCount += n;}
        void addDown(unsigned int n = 1){ downCount += n;}
        void addLike(unsigned int n = 1){ likeCount += n;}
        void addFave(unsigned int n = 1){ faveCount += n;}
        unsigned int baseScore(){
            return  playCount +
                2 * downCount +
                3 * likeCount +
                4 * faveCount;
        }
    };
    
    int main(){
        track test;
        const unsigned int dayLength = 24 * 3600;
        const unsigned int weekLength = dayLength * 7;    
    
        std::mt19937 gen(std::time(0));
        std::bernoulli_distribution playProb(0.5);
        std::bernoulli_distribution downProb(0.1);
        std::bernoulli_distribution likeProb(0.01);
        std::bernoulli_distribution faveProb(0.001);
    
        std::ofstream fakeRecord("fakeRecord.dat");
        std::ofstream fakeRecordDecay("fakeRecordDecay.dat");
        for(unsigned int i = 0; i < weekLength * 3; i += 3600){
            test.addPlay(playProb(gen));
            test.addDown(downProb(gen));
            test.addLike(likeProb(gen));
            test.addFave(faveProb(gen));    
    
            double baseScore = std::log(std::max(1,test.baseScore()));
            double timePoint = static_cast(i)/weekLength;        
    
            fakeRecord << timePoint << " " << baseScore << std::endl;
            if(timePoint > 1){
                double x = timePoint - 1;
                fakeRecordDecay << timePoint << " " << (baseScore * std::exp(-8*x*x)) << std::endl;
            }
            else
                fakeRecordDecay << timePoint << " " << baseScore << std::endl;
        }
        return 0;
    }
    

    Result:

    Decay

    This should be sufficient for you.

    0 讨论(0)
提交回复
热议问题