A cohort analysis groups users based on certain traits and compares their behavior across groups and over time.
The most common way of grouping users is by when they first joined. This definition will change for each company and even use case but it’s generally taken to be the day or week that a user joined. All users who joined at the same time are grouped together. Again, this can be made more complex if needed.
The simplest way to start comparing cohorts is seeing if they returned to your site/product in the weeks after they joined. You can continue to refine what patterns of behavior you’re looking for.
But for this example, we’ll do the following:
- Assign users a cohort based on the week they created their account
- For each cohort determine the % of users that returned for the 10 weeks after joining
- The way your data is structured will greatly determine how you do your analysis. Most commonly, there’s a transaction table that has timestamped events such as ‘create account’ or ‘purchase’ for each user. This is what we’ll use for our example.
- The data comes from BigQuery’s Example Google Analytics dataset.
- In this example I’ve gone step by step, but to do it all in one query you could try:
WITH cohorts as ( SELECT fullVisitorId, date_trunc(first_value(date) over (partition by fullVisitorId order by date asc),week) cohort FROM data ), activity as ( SELECT cohorts.cohort, date_diff(date,cohort,week) weeks_since_joining, count(distinct data.fullVisitorId) visitors FROM cohorts LEFT JOIN data ON cohorts.fullVisitorId = data.fullVisitorId WHERE date_diff(date,cohort,week) between 1 and 10 GROUP BY cohort, weeks_since_joining ), cohorts_with_weeks as ( SELECT distinct cohort, count(distinct fullVisitorId) cohort_size, weeks_since_joining FROM cohorts CROSS JOIN UNNEST(generate_array(1,10)) weeks_since_joining GROUP BY cohort, weeks_since_joining ), SELECT cohorts_with_weeks.cohort, cohorts_with_weeks.weeks_since_joining, cohorts_with_weeks.cohort_size, coalesce(activity.visitors,0)/cohorts_with_weeks.cohort_size per_active, FROM cohorts_with_weeks LEFT JOIN activity ON cohorts_with_weeks.cohort = activity.cohort AND cohorts_with_weeks.weeks_since_joining = activity.weeks_since_joining
3. Turn that into a percentage of users each week (being sure to include weeks where no users returned)
Now we can see our cohort table.