Intelligent MySQL GROUP BY for Activity Streams

前端 未结 4 980
长发绾君心
长发绾君心 2020-12-13 18:52

I\'m building an activity stream for our site, and have made some decent headway with something that works pretty well.

It\'s powered by two tables:

相关标签:
4条回答
  • 2020-12-13 19:27

    My impression is you need to group by user, as you do, but also, after that grouping, by action.

    It looks to me like you need a subquery like this:

    SELECT *, -- or whatever columns
       SUM(actions_in_group) AS total_rows_in_group,
       GROUP_CONCAT(in_collection) AS complete_collection
       FROM
         ( SELECT stream.*, -- or whatever columns
              COUNT(stream.id) AS actions_in_user_group,
              GROUP_CONCAT(stream.id) AS actions_in_user_collection
           FROM stream
           INNER JOIN follows
           ON stream.user_id = follows.following_user
           WHERE follows.user_id = '1'
             AND stream.hidden = '0'
           GROUP BY stream.user_id,
                date(stream.stream_date)
         )
       GROUP BY object_id,
                date(stream.stream_date)
       ORDER BY stream.stream_date DESC;
    

    Your initial query (now the inner one) groups by user, but then the user groups are regrouped by identical actions - that is, identical products bought or sales from one seller would be put together.

    0 讨论(0)
  • 2020-12-13 19:27

    Over at Fashiolista we've opensourced our approach to building feed systems. https://github.com/tschellenbach/Feedly It's currently the largest open source library aimed at solving this problem. (but written in Python)

    The same team which built Feedly also offers a hosted API, which handles the complexity for you. Have a look at getstream.io There are clients for PHP, Node, Ruby and Python. https://github.com/tbarbugli/stream-php It also offers support for custom defined aggregations, which you are looking for.

    In addition have a look at this high scalability post were we explain some of the design decisions involved: http://highscalability.com/blog/2013/10/28/design-decisions-for-scaling-your-high-traffic-feeds.html

    This tutorial will help you setup a system like Pinterest's feed using Redis. It's quite easy to get started with.

    To learn more about feed design I highly recommend reading some of the articles which we based Feedly on:

    • Yahoo Research Paper
    • Twitter 2013 Redis based, with fallback
    • Cassandra at Instagram
    • Etsy feed scaling
    • Facebook history
    • Django project, with good naming conventions. (But database only)
    • http://activitystrea.ms/specs/atom/1.0/ (actor, verb, object, target)
    • Quora post on best practises
    • Quora scaling a social network feed
    • Redis ruby example
    • FriendFeed approach
    • Thoonk setup
    • Twitter's Approach
    0 讨论(0)
  • 2020-12-13 19:31

    Some observations about your desired results:

    Some of the items are aggregated (Jack Sprat hearted seven sellers) and others are itemized (Lord Nelson chartered the Golden Hind). You probably need to have a UNION in your query that pulls together these two classes of items from two separate subqueries.

    You use a fairly crude timestamp-nearness function to group your items ... DATE(). You may want to use more sophisticated and tweakable scheme... like this, maybe

      GROUP BY TIMESTAMPDIFF(HOUR,CURRENT_TIME(),stream_date) DIV hourchunk
    

    This will let you group stuff by age chunks. For example if you use 48 for hourchunk you'll group stuff that's 0-48 hours ago together. As you add traffic and action to your system you may want to decrease the hourchunk value.

    0 讨论(0)
  • 2020-12-13 19:39

    We have resolved similar issue by using 'materialized view' approach - we are using dedicated table that gets updated on insert/update/delete event. All user activities are logged into this table and pre-prepared for simple selection and rendering.

    Benefit is simple and fast selection, drawback is little bit slower insert/update/delete since log table has to be updated as well.

    If this system is well design - it is a wining solution.

    This is quite easy to implement if you are using ORM with post insert/update/delete events (like Doctrine)

    0 讨论(0)
提交回复
热议问题