I\'m building an activity stream for our site, and have made some decent headway with something that works pretty well.
It\'s powered by two tables:
My impression is you need to group by user, as you do, but also, after that grouping, by action.
It looks to me like you need a subquery like this:
SELECT *, -- or whatever columns
SUM(actions_in_group) AS total_rows_in_group,
GROUP_CONCAT(in_collection) AS complete_collection
FROM
( SELECT stream.*, -- or whatever columns
COUNT(stream.id) AS actions_in_user_group,
GROUP_CONCAT(stream.id) AS actions_in_user_collection
FROM stream
INNER JOIN follows
ON stream.user_id = follows.following_user
WHERE follows.user_id = '1'
AND stream.hidden = '0'
GROUP BY stream.user_id,
date(stream.stream_date)
)
GROUP BY object_id,
date(stream.stream_date)
ORDER BY stream.stream_date DESC;
Your initial query (now the inner one) groups by user, but then the user groups are regrouped by identical actions - that is, identical products bought or sales from one seller would be put together.
Over at Fashiolista we've opensourced our approach to building feed systems. https://github.com/tschellenbach/Feedly It's currently the largest open source library aimed at solving this problem. (but written in Python)
The same team which built Feedly also offers a hosted API, which handles the complexity for you. Have a look at getstream.io There are clients for PHP, Node, Ruby and Python. https://github.com/tbarbugli/stream-php It also offers support for custom defined aggregations, which you are looking for.
In addition have a look at this high scalability post were we explain some of the design decisions involved: http://highscalability.com/blog/2013/10/28/design-decisions-for-scaling-your-high-traffic-feeds.html
This tutorial will help you setup a system like Pinterest's feed using Redis. It's quite easy to get started with.
To learn more about feed design I highly recommend reading some of the articles which we based Feedly on:
Some observations about your desired results:
Some of the items are aggregated (Jack Sprat hearted seven sellers) and others are itemized (Lord Nelson chartered the Golden Hind). You probably need to have a UNION in your query that pulls together these two classes of items from two separate subqueries.
You use a fairly crude timestamp-nearness function to group your items ... DATE()
. You may want to use more sophisticated and tweakable scheme... like this, maybe
GROUP BY TIMESTAMPDIFF(HOUR,CURRENT_TIME(),stream_date) DIV hourchunk
This will let you group stuff by age chunks. For example if you use 48 for hourchunk
you'll group stuff that's 0-48 hours ago together. As you add traffic and action to your system you may want to decrease the hourchunk
value.
We have resolved similar issue by using 'materialized view' approach - we are using dedicated table that gets updated on insert/update/delete event. All user activities are logged into this table and pre-prepared for simple selection and rendering.
Benefit is simple and fast selection, drawback is little bit slower insert/update/delete since log table has to be updated as well.
If this system is well design - it is a wining solution.
This is quite easy to implement if you are using ORM with post insert/update/delete events (like Doctrine)