Cohort analysis in SQL

前端 未结 4 1649
慢半拍i
慢半拍i 2020-12-09 13:12

Looking to do some cohort analysis on a userbase. We have 2 tables \"users\" and \"sessions\", where users and sessions both have a \"created_at\" field. I\'m looking to f

4条回答
  •  星月不相逢
    2020-12-09 13:41

    This answer inverts the output table that @Newy wanted so the cohorts are the rows instead of the columns, and uses absolute dates instead of relative ones.

    I was looking for a query that would give me something like this:

    Date        d0  d1  d2  d3  d4  d5  d6
    2016-11-03  3   1   0   0   0   0   0
    2016-11-04  4   2   0   1   0   0   *
    2016-11-05  7   0   1   1   0   *   *
    2016-11-06  7   3   1   1   *   *   *
    2016-11-07  13  5   1   *   *   *   *
    2016-11-08  4   0   *   *   *   *   *
    2016-11-09  1   *   *   *   *   *   *
    

    I was looking for the number of users that signed up a certain date, then how many of those users returned 1 day later, 2 days later, etc. So on 2016-11-07 13 users signed up and had a session, then 5 of those users came back 1 day later, then one user came back 2 days later, etc.

    I took the first subquery of @Andriy M's large query and modified it to give me the date a user signed up, not the days relative to the current date:

    SELECT
        id,
        DATE(created_at) AS DayOffset
      FROM users
      WHERE created_at >= CURDATE() - INTERVAL 6 DAY
    

    Then the LEFT JOIN subquery I modified to look like this:

     SELECT DISTINCT
        sessions.user_id,
        DATEDIFF(sessions.created_at, user.created_at) AS DayOffset
        FROM sessions
        LEFT JOIN users ON (users.id = sessions.user_id)
        WHERE sessions.created_at >= CURDATE() - INTERVAL 6 DAY
    

    I wanted the dayoffset not relative to the current date as in @Andriy M's answer, but relative to the date the user signed up. So I did left join on the user table to get the time the user signed up and did a date diff on that.

    So the final query looks something like this:

    SELECT u.DayOffset as Date,
      SUM(s.DayOffset = 0) AS d0,
      SUM(s.DayOffset = 1) AS d1,
      SUM(s.DayOffset = 2) AS d2,
      SUM(s.DayOffset = 3) AS d3,
      SUM(s.DayOffset = 4) AS d4,
      SUM(s.DayOffset = 5) AS d5,
      SUM(s.DayOffset = 6) AS d6
    FROM (
     SELECT
        id,
        DATE(created_at) AS DayOffset
      FROM users
      WHERE created_at >= CURDATE() - INTERVAL 6 DAY
    ) as u
    LEFT JOIN (
        SELECT DISTINCT
        sessions.user_id,
        DATEDIFF(sessions.created_at, user.created_at) AS DayOffset
        FROM sessions
        LEFT JOIN users ON (users.id = sessions.user_id)
        WHERE sessions.created_at >= CURDATE() - INTERVAL 6 DAY
    ) as s
    ON s.user = u.id
    GROUP BY u.DayOffset
    

提交回复
热议问题