Introducing Theseus, a Python library for cohort analysis

One of the major themes in digital marketing that I’ve commentated upon over the past two years is the increasingly prominent role of analytics in a marketing operation (the other is the critical importance of creative variation and experimentation).

But what does “analytics” mean in a marketing context? My sense is that it means a marketing team has clear insight into:

  • How different user profiles retain and monetize with the product;
  • The different ad creative elements and themes to which those various profiles respond;
  • The presence of those user profiles in the product over time;
  • The unit economics of those user profiles and the timelines over which their LTV contributions are realized (effectively, ROAS);
  • The optimal budget allocation across channels, campaigns, creative formats, etc. for reaching some combination of those profiles to maximize overall revenue (effectively, a media mix model).

So effective analytics for a marketing team might manifest in being able to answer questions like, “What will the composition of the user base look like in six months given the retention differences between US-based iOS users acquired on Facebook and US-based Android users acquired via Google UAC?” and “How many users that are at least 30 days old will exist in the user base in one year given our historical acquisition spend?”

These questions are important because they are the fundamental factors of growth; marketers rightly focus on ROAS with respect to paid acquisition because it’s the most important measure of ad spend profitability, but products also need to grow. Being able to break a product’s user base apart into source- and retention-based component parts allows marketers to understand how and at what pace their product is actually growing.

To that end, today I am excited to release an open source Python library for cohort analysis called Theseus. Theseus provides a set of easy-to-use functions for building cohort projections, for segmenting cohorts by age and retention characteristics, for calculating required DNU to reach DAU targets, and for conducting general product growth analysis.

Theseus is built on some of the code I released in High growth, low growth, no growth: systematic growth with DAU replacement, but it provides much more functionality. With Theseus, a marketing analyst can:

  • Easily construct a retention profile with retention input data;
  • Project out cohort presence in the product across some timeline;
  • Calculate the new user volumes needed to reach some DAU target given a starting point;
  • Combine cohorts with different retention profiles into DAU estimates over some timeline;
  • Break out cohorts by age (eg. 30 users that are exactly 7 days old exist in the DAU tomorrow).

Theseus is designed to be used for either ad hoc analysis in eg. a Jupyter Notebook or integrated into a business intelligence stack and utilized in a nightly ETL (to eg. recreate forward user base projections each night based on that day’s new user and churn activity). Theseus can be used to plan budgets, to optimize a media mix model, or to prioritize product features (if eg. if it is apparent that the user base will be primarily made up of users acquired via paid campaigns that are older than 90 days at some point in the future, does that impact the product roadmap?).

My personal ambition for Theseus is that, as it gains more functionality over time, it becomes an indispensable tool in the marketing science landscape. A more grounded objective is simply that it helps people make better marketing decisions. (And if I can help in making those decisions via my strategy consulting practice, Heracles, please don’t hesitate to reach out.)

Installation instructions and some simple usage examples for Theseus are available on the Theseus GitHub page. Note that Theseus is in a beta state; bugs are to be expected.