Cohort Analysis

Cohort Analysis

Description

A cohort analysis groups users based on certain traits and compares their behavior across groups and over time.
The most common way of grouping users is by when they first joined. This definition will change for each company and even use case but it’s generally taken to be the day or week that a user joined. All users who joined at the same time are grouped together. Again, this can be made more complex if needed.
The simplest way to start comparing cohorts is seeing if they returned to your site/product in the weeks after they joined. You can continue to refine what patterns of behavior you’re looking for.
But for this example, we’ll do the following:

  • Assign users a cohort based on the week they created their account
  • For each cohort determine the % of users that returned for the 10 weeks after joining

The Data

  • The way your data is structured will greatly determine how you do your analysis. Most commonly, there’s a transaction table that has timestamped events such as ‘create account’ or ‘purchase’ for each user. This is what we’ll use for our example.
  • The data comes from BigQuery’s Example Google Analytics dataset.

The Code

  • In this example I’ve gone step by step, but to do it all in one query you could try:
WITH cohorts as 
(
   SELECT 
      fullVisitorId,
      date_trunc(first_value(date) over (partition by fullVisitorId order by date asc),week) cohort
FROM data
),
 activity as 
(
   SELECT
      cohorts.cohort,
      date_diff(date,cohort,week) weeks_since_joining,
      count(distinct data.fullVisitorId) visitors
   FROM
      cohorts
   LEFT JOIN data ON cohorts.fullVisitorId = data.fullVisitorId
   WHERE 
      date_diff(date,cohort,week) between 1 and 10
   GROUP BY cohort, weeks_since_joining
),
 cohorts_with_weeks as 
(
   SELECT distinct
      cohort,
      count(distinct fullVisitorId) cohort_size,
      weeks_since_joining
   FROM cohorts
   CROSS JOIN
   UNNEST(generate_array(1,10)) weeks_since_joining
   GROUP BY cohort, weeks_since_joining
),
SELECT
   cohorts_with_weeks.cohort,
   cohorts_with_weeks.weeks_since_joining,
   cohorts_with_weeks.cohort_size,
   coalesce(activity.visitors,0)/cohorts_with_weeks.cohort_size per_active,
FROM
   cohorts_with_weeks
LEFT JOIN activity 
   ON cohorts_with_weeks.cohort = activity.cohort 
   AND cohorts_with_weeks.weeks_since_joining = activity.weeks_since_joining

Data Preview

1. Assign each user a cohort based on the week they first appeared

2. For each cohort find the number of users that have returned for weeks 1-10

3. Turn that into a percentage of users each week (being sure to include weeks where no users returned)

Now we can see our cohort table.

1 Like