Looking to do some cohort analysis on a userbase. We have 2 tables \"users\" and \"sessions\", where users and sessions both have a \"created_at\" field. I\'m looking to f
This answer inverts the output table that @Newy wanted so the cohorts are the rows instead of the columns, and uses absolute dates instead of relative ones.
I was looking for a query that would give me something like this:
Date d0 d1 d2 d3 d4 d5 d6
2016-11-03 3 1 0 0 0 0 0
2016-11-04 4 2 0 1 0 0 *
2016-11-05 7 0 1 1 0 * *
2016-11-06 7 3 1 1 * * *
2016-11-07 13 5 1 * * * *
2016-11-08 4 0 * * * * *
2016-11-09 1 * * * * * *
I was looking for the number of users that signed up a certain date, then how many of those users returned 1 day later, 2 days later, etc. So on 2016-11-07 13 users signed up and had a session, then 5 of those users came back 1 day later, then one user came back 2 days later, etc.
I took the first subquery of @Andriy M's large query and modified it to give me the date a user signed up, not the days relative to the current date:
SELECT
id,
DATE(created_at) AS DayOffset
FROM users
WHERE created_at >= CURDATE() - INTERVAL 6 DAY
Then the LEFT JOIN subquery I modified to look like this:
SELECT DISTINCT
sessions.user_id,
DATEDIFF(sessions.created_at, user.created_at) AS DayOffset
FROM sessions
LEFT JOIN users ON (users.id = sessions.user_id)
WHERE sessions.created_at >= CURDATE() - INTERVAL 6 DAY
I wanted the dayoffset not relative to the current date as in @Andriy M's answer, but relative to the date the user signed up. So I did left join on the user table to get the time the user signed up and did a date diff on that.
So the final query looks something like this:
SELECT u.DayOffset as Date,
SUM(s.DayOffset = 0) AS d0,
SUM(s.DayOffset = 1) AS d1,
SUM(s.DayOffset = 2) AS d2,
SUM(s.DayOffset = 3) AS d3,
SUM(s.DayOffset = 4) AS d4,
SUM(s.DayOffset = 5) AS d5,
SUM(s.DayOffset = 6) AS d6
FROM (
SELECT
id,
DATE(created_at) AS DayOffset
FROM users
WHERE created_at >= CURDATE() - INTERVAL 6 DAY
) as u
LEFT JOIN (
SELECT DISTINCT
sessions.user_id,
DATEDIFF(sessions.created_at, user.created_at) AS DayOffset
FROM sessions
LEFT JOIN users ON (users.id = sessions.user_id)
WHERE sessions.created_at >= CURDATE() - INTERVAL 6 DAY
) as s
ON s.user = u.id
GROUP BY u.DayOffset