PHP forums - how to cope with unread discussions / topics / posts

后端 未结 7 1052
伪装坚强ぢ
伪装坚强ぢ 2020-12-07 17:10

I know this question has been asked here a couple of times, but none of the answers had pleased me. This is because almost all of them involve a huge read / write process re

7条回答
  •  长情又很酷
    2020-12-07 17:29

    There is... another.

    Another way to store detailed read/unread data for a hierarchical forum structure ( board > section > thread, etc.). It does so without a) having to pre-populate read/unread information, and b) without having to ever store more than U*(M/2) rows in its worst case, where U is the number of users, and M is the total number of posts in the database (and usually much, much less than this)

    I researched this topic a while ago. I found that SMF/phpBB "cheat" a little bit in how they store user reading history. Their schema supports storage of either the last timestamps or message ID that was marked as read in a given board, forum, subforum, topic (or viewed directly by the browser), like so:

    [ user_id, board, last_msg_id, last_timestamp ]

    [ user_id, board, forum, last_msg_id, last_timestamp ]

    [ user_id, board, forum, subforum, last_msg_id, last_timestamp ]

    [ user_id, board, forum, subforum, topic, last_msg_id, last_timestamp ]

    This lets users mark specific boards, forums, topics, etc., as "read". It requires, however, either action on the part of the user (either by reading, or actively clicking "mark as read"), and in the case of phpBB, doesn't give you the granularity to say "I have seen this specific message, but not that specific message." You also get the situation where you read the last message in a topic first (viewing the latest activity in a thread), and you're immediately assumed to have read the rest of the thread.

    It works for SMF and phpBB to store things like this because it's rare that you're ever viewing just one post (default views are set up for 20+ posts in the last page of a topic). However, for more threaded forums (particularly forums where you're viewing messages one at a time), this is less than ideal. Users of this system will likely care a whole lot if they have read one message but not another, and might consider it cumbersome to only be able to mark an entire section as read, when really they just wanted a few marked as read.

    You store messages in tuples like this: [ user_id, lower_msg_id, upper_msg_id ]

    The user history log is maintained as the following:

    Upon page view, a function looks to see if user_id has a record where current_msg_id is between lower_msg_id and upper_msg_id. If it has, then this page is read, and no action needs taken. If it hasn't, then another query has to be issued, this time determining if current_msg_id is either one less than lower_msg_id (current_msg_id == lower_msg_id-1), or one more than upper_msg_id (current_msg_id == upper_msg_id +1). This is the case where we grow our "read" or "seen" boundary by 1. If we're one away from lower_msg_id or uppper_msg_id, then we grow the tuple by 1 in that direction. If we aren't growing our tuple range, then we insert a new tuple, [ user_id, current_msg_id, current_msg_id ].

    Corner case is when two tuple ranges approach each other. In this case, upon searching between the lower tuple boundary and the upper tuple boundary, merge the two boundaries by setting the upper boundary of the lower tuple to the upper boundary of the upper tuple, and delete the upper tuple.

    Code example in PHP:

    function seen_bounds( $usr_id, $msg_id ) {
    
        # mysql escape
        $usr_id = mres( $usr_id );
        $msg_id = mres( $msg_id );
    
        $seen_query = "
            SELECT
                msb.id,
                msb.lower_msg_id,
                msb.upper_msg_id
            FROM
                msgs_seen_bounds msb
            WHERE
                $msg_id BETWEEN msb.lower_msg_id AND msb.upper_msg_id AND
                msb.usr_id = $usr_id
            LIMIT 1;
        ";
    
        # See if this post already exists within a given
        # seen bound.
        $seen_row = query($seen_query, ROW);
    
        if($seen_row == 0) {
            # Has not been seen, try to detect if we're "near"
            # another bound (and we can grow that bound to include
            # this post).
            $lower_query = "
                SELECT
                    msb.id,
                    msb.lower_msg_id,
                    msb.upper_msg_id
                FROM
                    msgs_seen_bounds msb
                WHERE
                    msb.upper_msg_id = ($msg_id - 1) AND
                    msb.usr_id = $usr_id
                LIMIT 1;
            ";
    
            $upper_query = "
                SELECT
                    msb.id,
                    msb.lower_msg_id,
                    msb.upper_msg_id
                FROM
                    msgs_seen_bounds msb
                WHERE
                    msb.lower_msg_id = ($msg_id + 1) AND
                    msb.usr_id = $usr_id
                LIMIT 1;
            ";
    
            $lower = query($lower_query, ROW);
            $upper = query($upper_query, ROW);
    
            if( $lower == 0 && $upper == 0 ) {
                # No bounds exist for or near this. We'll insert a single-ID
                # bound
    
                $saw_query = "
                    INSERT INTO
                        msgs_seen_bounds
                    (usr_id, lower_msg_id, upper_msg_id)
                    VALUES
                    ($usr_id, $msg_id, $msg_id)
                    ;
                ";
    
                query($saw_query, NONE);
            } else {
                if( $lower != 0 && $upper != 0 ) {
                    # Found "near" bounds both on the upper
                    # and lower bounds.
    
                    $update_query = '
                        UPDATE msgs_seen_bounds
                        SET
                            upper_msg_id = ' . $upper['upper_msg_id'] . '
                        WHERE
                            msgs_seen_bounds.id = ' . $lower['id'] . '
                        ;
                    ';
    
                    $delete_query = '
                        DELETE FROM msgs_seen_bounds
                        WHERE
                            msgs_seen_bounds.id = ' . $upper['id'] . '
                        ;
                    ';
    
                    query($update_query, NONE);
                    query($delete_query, NONE);
                } else {
                    if( $lower != 0 ) {
                        # Only found lower bound, update accordingly.
                        $update_query = '
                            UPDATE msgs_seen_bounds
                            SET
                                upper_msg_id = ' . $msg_id . '
                            WHERE
                                msgs_seen_bounds.id = ' . $lower['id'] . '
                            ;
                        ';
    
                        query($update_query, NONE);
                    }
    
                    if( $upper != 0 ) {
                        # Only found upper bound, update accordingly.
                        $update_query = '
                            UPDATE msgs_seen_bounds
                            SET
                                lower_msg_id = ' . $msg_id . '
                            WHERE
                                msgs_seen_bounds.id = ' . $upper['id'] . '
                            ;
                        ';
    
                        query($update_query, NONE);
                    }
                }
            }
        } else {
            # Do nothing, already seen.
        }
    
    }
    

    Searching for unread posts is finding where current_msg_id does not exist between any lower_msg_id and upper_msg_id for a given user (a NOT EXISTS query in SQL terms). It's not the most efficient of queries when implementing in a relational database, but can be solved by aggressive indexing. For example, the following is a SQL query for counting unread posts for a given user, grouping by the discussion area ("item") that the posts are in:

    $count_unseen_query = "
        SELECT 
            msgs.item as id,
            count(1) as the_count
        FROM msgs
        WHERE
        msgs.usr != " . $usr_id . " AND
        msgs.state != 'deleted' AND
        NOT EXISTS (
           SELECT 1 
           FROM 
              msgs_seen_bounds msb
           WHERE 
              msgs.id BETWEEN msb.lower_msg_id AND msb.upper_msg_id
              AND msb.usr_id = " . $usr_id . "
        )
        GROUP BY msgs.item
        ;
    

    The more users read on the forum, the wider the bounds marked as read by each tuple, and the less tuples have to be stored. Users can get an accurate count of read vs. unread, and can pretty easily be aggregated to see read vs. unread in each forum, subforum, topic, etc.

    Given a small forum of about 2000+ posts, the following are the usage statistics regarding the number of tuples stored, sorted by number of times users have logged in (approximating user activity). The column "num_bounds" is the number of tuples required to store the user's "num_posts_read" viewing history.

    id  num_log_entries num_bounds num_posts_read num_posts
    479             584         11           2161       228
    118             461          6           2167       724
    487             119         34           2093       199
    499              97          6           2090       309
    476              71        139            481        82
    480              33         92            167        26
    486              33        256            757       154
    496              31        108            193        51
    490              31         80            179        61
    475              28        129            226        47
    491              22         22           1207        24
    502              20        100            232        65
    493              14         73            141         5
    489              14         12           1517        22
    498              10         72            132        17
    

    I haven't seen this particular implementation in any forum but my own custom one, and it's a small one at that. I'd be interested if anybody else has implemented, or seen this implemented elsewhere, particularly in a large and/or active forum.

    Regards,

    Kaiden

提交回复
热议问题