Popular Today, This Week, This Month - Design Pattern

前端 未结 3 397
天命终不由人
天命终不由人 2021-01-02 11:23

I have a system that displays entries ordered by one of three fields, the most popular Today, This Week and This Month. Each time an entry is viewed the score is incremented

相关标签:
3条回答
  • 2021-01-02 11:56

    One simple solution would be

    Use an array of 31.
    Today - the last value
    This Week score would be the sum of the last 7 values.
    This Month would be the sum of the last 31 values.
    
    At the end of each day, shift the whole array values by 1 to accommodate new value.
    

    With respect to your comment,

    Use another array of size 24 to store hours visit count.
    Today - Sum of all elements of Array2
    This Week score would be the sum of the last 7 values of Array1.
    This Month would be the Sum of all elements of Array1.
    
    At the end of each day, shift the whole array values of Array1 by 1
    to accommodate new value. Last day visit count = Sum of all elements of Array2
    
    0 讨论(0)
  • 2021-01-02 12:04

    Maybe some kind of attenuation might help. You'd need 6 variables for Today, Yesterday, ThisWeek, LastWeek, ThisMonth, LastMonth.

    Then the final rating (for instance daily) may be caltulated as: Today + Yesterday * attenuation( current_time - start_of_the_day ).

    Where attenuation is something like 1 / (1 + k * time), where k is adjustible depending on how fast you want your last days rating to deflate.

    UPDATE: Consider new entry was viewed 123 times during a day. And lets measure time in seconds just to get to some numbers. At 23:59 etrys' rating would be 123 + 0 * 1 / (1 + k * 86340)^2 = 100.

    At midnight Today counter becomes Yesterday:

    0 + 123 * 1 / ( 1 + k * 0)^2 = 123
    

    Suppose by midday an entry gains 89 more views.

    89 + 123 * 1 / ( 1 + k * 43200 )^2 = ?
    

    Well, it's a good time to choose the k. If we want old views to fade four times in 12 hours, then k would be 1/43200. If we want in to fade hundred times - 9/43200. In this case:

    89 + 123 * 1 / ( 1 + 9 )^2 = 90.23
    

    And then forth to 23:59. Let entry gain 60 more views

    149 + 123 * 1 / ( 1 + (9/43200) * 86340 )^2 ~= 149.002
    

    So yesterday views almost completely lost their influence on a rating in 24 hours. Of course you can play with k or attenuation formula in general to match your needs best. This is just an example.

    0 讨论(0)
  • 2021-01-02 12:08

    This is actually a common problem of how to group data both effectively and keep all the necessary information.

    First of all: Did you try doing it your way? Did you really lack the storage? Your solution seems reasonable.

    How I would do it

    I assume that you are using a database for keeping the data.

    I would create two separate tables, one for hourly and one for daily statistics. Each article would have exactly 24 rows in that database, one for each hour. That would be used for hourly stats. To update a specific row you would only have to know the hour(0-23) and the entry_id. UPDATE count=count+1 WHERE hour=11 AND entry_id = 18164;

    entry_id foreign key | hour integer | count integer
    ---------------------+--------------+--------------
    1                    | 0            | 123
    1                    | 2            | 1712
    ...
    

    Current daily stats would be either computed around midnight (or whenever the app does the least) or summed on demand. Either way, once per day, a sum will have to be made of all hourly data and the sum will have to be inserted into the daily stats table.

    entry_id foreign key | day date   | count integer
    ---------------------+------------+--------------
    1                    | 2013-07-03 | 54197
    1                    | 2013-07-04 | 66123
    ...
    

    Each entry older than 31 (30/29/28) days should get deleted. Or not, if you want total or yearly statistics

    Advantages

    • you keep less data than with full hourly stats: 24+31
    • sums on hourly table should be fast, if indexed on entry_id and hour
    • less memory used than in your solution

    Disadvantages

    • additional scripting/triggers/jobs required to daily update the statistics
    • more work required to implement it than in your solution
    0 讨论(0)
提交回复
热议问题